Re: [PR] GH-41105: [Python][Docs] Update PyArrow installation docs for conda package split [arrow]

via GitHub Tue, 14 May 2024 04:06:05 -0700


jorisvandenbossche commented on code in PR #41135:
URL: https://github.com/apache/arrow/pull/41135#discussion_r1599824890



##########
docs/source/python/install.rst:
##########
@@ -107,17 +107,41 @@ a custom path to the database from Python:
 Differences between conda-forge packages
 ----------------------------------------
 
-PyArrow is packaged on `conda-forge <https://conda-forge.org/>`_ as three
+On `conda-forge <https://conda-forge.org/>`_, PyArrow is published as three
 separate packages, each providing varying levels of functionality. This is in
 contrast to PyPi, where only a single PyArrow package is provided.
 
 The purpose of this split is to minimize the size of the installed package for
 most users (``pyarrow``), provide a smaller, minimal package for specialized 
use
 cases (``pyarrow-core``), while still providing a complete package for users 
who
-require it (``pyarrow-all``).
+require it (``pyarrow-all``). What was historically ``pyarrow`` on
+`conda-forge <https://conda-forge.org/>`_ is now ``pyarrow-all``, though most
+users can continue using ``pyarrow``.
 
-The table below lists the functionality provided by each package and may be
-useful when deciding to use one package over another:
+The ``pyarrow-core`` package includes the following functionality:
+
+- :ref:`data`
+- :ref:`compute` (i.e., ``pyarrow.compute``)
+- :ref:`io`
+- :ref:`ipc` (i.e., ``pyarrow.ipc``)
+- :ref:`filesystem` (HDFS, S3, GCS, etc.)

Review Comment:
   If we mention that here, I think we should also say that those cloud 
filesystem are planned to moved out of pyarrow-core in the next release, and so 
you should install `pyarrow` if you want to rely on those being present



##########
docs/source/python/install.rst:
##########
@@ -107,17 +107,41 @@ a custom path to the database from Python:
 Differences between conda-forge packages
 ----------------------------------------
 
-PyArrow is packaged on `conda-forge <https://conda-forge.org/>`_ as three
+On `conda-forge <https://conda-forge.org/>`_, PyArrow is published as three
 separate packages, each providing varying levels of functionality. This is in
 contrast to PyPi, where only a single PyArrow package is provided.
 
 The purpose of this split is to minimize the size of the installed package for
 most users (``pyarrow``), provide a smaller, minimal package for specialized 
use
 cases (``pyarrow-core``), while still providing a complete package for users 
who
-require it (``pyarrow-all``).
+require it (``pyarrow-all``). What was historically ``pyarrow`` on
+`conda-forge <https://conda-forge.org/>`_ is now ``pyarrow-all``, though most
+users can continue using ``pyarrow``.
 
-The table below lists the functionality provided by each package and may be
-useful when deciding to use one package over another:
+The ``pyarrow-core`` package includes the following functionality:
+
+- :ref:`data`
+- :ref:`compute` (i.e., ``pyarrow.compute``)
+- :ref:`io`
+- :ref:`ipc` (i.e., ``pyarrow.ipc``)
+- :ref:`filesystem` (HDFS, S3, GCS, etc.)

Review Comment:
   If we list that here, I think we should also say that those cloud filesystem 
are planned to moved out of pyarrow-core in the next release, and so you should 
install `pyarrow` if you want to rely on those being present



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-41105: [Python][Docs] Update PyArrow installation docs for conda package split [arrow]

Reply via email to