amoeba commented on code in PR #41135:
URL: https://github.com/apache/arrow/pull/41135#discussion_r1560137674
##########
docs/source/python/install.rst:
##########
@@ -93,3 +100,41 @@ a custom path to the database from Python:
>>> import pyarrow as pa
>>> pa.set_timezone_db_path("custom_path")
+
+
+.. _python-conda-differences:
+
+Differences between conda-forge packages
+----------------------------------------
+
+PyArrow is packaged on `conda-forge <https://conda-forge.org/>`_ as three
+separate packages, each providing varying levels of functionality. This is in
+contrast to PyPi, where only a single PyArrow package is provided.
+
+The purpose of this split is to minimize the size of the installed package for
+most users (``pyarrow``), provide a smaller, minimal package for specialized
use
+cases (``pyarrow-core``), while still providing a complete package for users
who
+require it (``pyarrow-all``).
+
+The table below lists the functionality provided by each package and may be
+useful when deciding to use one package over another:
+
++------------+------------------------------+------------------------------+------------------------------+
+| Component | pyarrow | pyarrow-core |
pyarrow-all |
++============+==============================+==============================+==============================+
+| Core | :fas:`check;sd-text-success` | :fas:`check;sd-text-success` |
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
+| Parquet | :fas:`check;sd-text-success` | |
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
+| Datasets | :fas:`check;sd-text-success` | |
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
+| Acero | :fas:`check;sd-text-success` | |
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
+| Substrait | :fas:`check;sd-text-success` | |
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
+| Flight | | |
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
+| Flight SQL | | |
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
+| Gandiva | | |
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
Review Comment:
The rows in the table above are based on @raulcd's breakdown on the mailing
list but I think we want to be careful about what's listed here. I think the
best thing would be for this table to fully align with the submodules exposed
in PyArrow since that's what the user is most familiar with. And we might even
consider renaming the "Component" column to "Module" and using the literal
module names. i.e., `parquet` instead of Parquet so it's clear we're talking
about being able to run `import pyarrow.parquet` or not.
This would mean the table isn't complete yet (json, csv, filesystems, orc,
more?).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]