Re: [PR] GH-41105: [Python][Docs] Update PyArrow installation docs for conda package split [arrow]

via GitHub Wed, 10 Apr 2024 15:51:33 -0700


amoeba commented on code in PR #41135:
URL: https://github.com/apache/arrow/pull/41135#discussion_r1560137674



##########
docs/source/python/install.rst:
##########
@@ -93,3 +100,41 @@ a custom path to the database from Python:
 
    >>> import pyarrow as pa
    >>> pa.set_timezone_db_path("custom_path")
+
+
+.. _python-conda-differences:
+
+Differences between conda-forge packages
+----------------------------------------
+
+PyArrow is packaged on `conda-forge <https://conda-forge.org/>`_ as three
+separate packages, each providing varying levels of functionality. This is in
+contrast to PyPi, where only a single PyArrow package is provided.
+
+The purpose of this split is to minimize the size of the installed package for
+most users (``pyarrow``), provide a smaller, minimal package for specialized 
use
+cases (``pyarrow-core``), while still providing a complete package for users 
who
+require it (``pyarrow-all``).
+
+The table below lists the functionality provided by each package and may be
+useful when deciding to use one package over another:
+
++------------+------------------------------+------------------------------+------------------------------+
+| Component  | pyarrow                      | pyarrow-core                 | 
pyarrow-all                  |
++============+==============================+==============================+==============================+
+| Core       | :fas:`check;sd-text-success` | :fas:`check;sd-text-success` | 
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
+| Parquet    | :fas:`check;sd-text-success` |                              | 
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
+| Datasets   | :fas:`check;sd-text-success` |                              | 
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
+| Acero      | :fas:`check;sd-text-success` |                              | 
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
+| Substrait  | :fas:`check;sd-text-success` |                              | 
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
+| Flight     |                              |                              | 
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
+| Flight SQL |                              |                              | 
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+
+| Gandiva    |                              |                              | 
:fas:`check;sd-text-success` |
++------------+------------------------------+------------------------------+------------------------------+

Review Comment:
   The rows in the table above are based on @raulcd's breakdown on the mailing 
list but I think we want to be careful about what's listed here. I think the 
best thing would be for this table to fully align with the submodules exposed 
in PyArrow since that's what the user is most familiar with. And we might even 
consider renaming the "Component" column to "Module" and using the literal 
module names. i.e., `parquet` instead of Parquet so it's clear we're talking 
about being able to run `import pyarrow.parquet` or not.
   
   This would mean the table isn't complete yet (json, csv, filesystems, orc, 
more?).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-41105: [Python][Docs] Update PyArrow installation docs for conda package split [arrow]

Reply via email to