This is an automated email from the ASF dual-hosted git repository.

jorisvandenbossche pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new aea10c2b59 GH-41480: [Python] Update Python development guide about 
components being enabled by default based on Arrow C++ (#41705)
aea10c2b59 is described below

commit aea10c2b59043397639a80c7582a1d3e5c588125
Author: Joris Van den Bossche <[email protected]>
AuthorDate: Thu Jun 13 14:44:04 2024 +0200

    GH-41480: [Python] Update Python development guide about components being 
enabled by default based on Arrow C++ (#41705)
    
    ### Rationale for this change
    
    Follow-up on https://github.com/apache/arrow/pull/41494 to update the 
Python development guide to reflect the change in how PyArrow is build 
(defaults for the various `PYARROW_BUILD_<component>` are now set based on the 
`ARROW_<component>` setting. The current `PYARROW_WITH_<component>` environment 
variables are kept working to allow to override this default)
    
    * GitHub Issue: #41480
    
    Authored-by: Joris Van den Bossche <[email protected]>
    Signed-off-by: Joris Van den Bossche <[email protected]>
---
 docs/source/developers/python.rst | 95 ++++++++++++++++++++-------------------
 1 file changed, 49 insertions(+), 46 deletions(-)

diff --git a/docs/source/developers/python.rst 
b/docs/source/developers/python.rst
index e84cd25201..2f3e892ce8 100644
--- a/docs/source/developers/python.rst
+++ b/docs/source/developers/python.rst
@@ -397,18 +397,14 @@ Now, build pyarrow:
 .. code-block::
 
    $ pushd arrow/python
-   $ export PYARROW_WITH_PARQUET=1
-   $ export PYARROW_WITH_DATASET=1
    $ export PYARROW_PARALLEL=4
    $ python setup.py build_ext --inplace
    $ popd
 
-If you did build one of the optional components (in C++), you need to set the
-corresponding ``PYARROW_WITH_$COMPONENT`` environment variable to 1.
-
-Similarly, if you built with ``PARQUET_REQUIRE_ENCRYPTION`` (in C++), you
-need to set the corresponding ``PYARROW_WITH_PARQUET_ENCRYPTION`` environment
-variable to 1.
+If you did build one of the optional components in C++, the equivalent 
components
+will be enabled by default for building pyarrow. This default can be overridden
+by setting the corresponding ``PYARROW_WITH_$COMPONENT`` environment variable
+to 0 or 1, see :ref:`python-dev-env-variables` below.
 
 To set the number of threads used to compile PyArrow's C++/Cython components,
 set the ``PYARROW_PARALLEL`` environment variable.
@@ -551,7 +547,6 @@ Now, we can build pyarrow:
 .. code-block::
 
    $ pushd arrow\python
-   $ set PYARROW_WITH_PARQUET=1
    $ set CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1
    $ python setup.py build_ext --inplace
    $ popd
@@ -601,46 +596,12 @@ Then run the unit tests with:
 Caveats
 -------
 
+.. _python-dev-env-variables:
+
 Relevant components and environment variables
 =============================================
 
-List of relevant Arrow CMake flags and corresponding environment variables
-to be used when building PyArrow are:
-
-.. list-table::
-   :widths: 30 30
-   :header-rows: 1
-
-   * - Arrow flags/options
-     - Corresponding environment variables for PyArrow
-   * - ``CMAKE_BUILD_TYPE``
-     - ``PYARROW_BUILD_TYPE`` (release, debug or relwithdebinfo)
-   * - ``ARROW_GCS``
-     - ``PYARROW_WITH_GCS``
-   * - ``ARROW_S3``
-     - ``PYARROW_WITH_S3``
-   * - ``ARROW_HDFS``
-     - ``PYARROW_WITH_HDFS``
-   * - ``ARROW_CUDA``
-     - ``PYARROW_WITH_CUDA``
-   * - ``ARROW_SUBSTRAIT``
-     - ``PYARROW_WITH_SUBSTRAIT``
-   * - ``ARROW_FLIGHT``
-     - ``PYARROW_WITH_FLIGHT``
-   * - ``ARROW_DATASET``
-     - ``PYARROW_WITH_DATASET``
-   * - ``ARROW_PARQUET``
-     - ``PYARROW_WITH_PARQUET``
-   * - ``PARQUET_REQUIRE_ENCRYPTION``
-     - ``PYARROW_WITH_PARQUET_ENCRYPTION``
-   * - ``ARROW_TENSORFLOW``
-     - ``PYARROW_WITH_TENSORFLOW``
-   * - ``ARROW_ORC``
-     - ``PYARROW_WITH_ORC``
-   * - ``ARROW_GANDIVA``
-     - ``PYARROW_WITH_GANDIVA``
-
-List of relevant environment variables that can also be used to build
+List of relevant environment variables that can be used to build
 PyArrow are:
 
 .. list-table::
@@ -650,6 +611,9 @@ PyArrow are:
    * - PyArrow environment variable
      - Description
      - Default value
+   * - ``PYARROW_BUILD_TYPE``
+     - Build type for PyArrow (release, debug or relwithdebinfo), sets 
``CMAKE_BUILD_TYPE``
+     - ``release``
    * - ``PYARROW_CMAKE_GENERATOR``
      - Example: ``'Visual Studio 15 2017 Win64'``
      - ``''``
@@ -678,6 +642,45 @@ PyArrow are:
      - Number of processes used to compile PyArrow’s C++/Cython components
      - ``''``
 
+The components being disabled or enabled when building PyArrrow is by default
+based on how Arrow C++ is build (i.e. it follows the ``ARROW_$COMPONENT`` 
flags).
+However, the ``PYARROW_WITH_$COMPONENT`` environment variables can still be 
used
+to override this when building PyArrow (e.g. to disable components, or to 
enforce
+certain components to be built):
+
+.. list-table::
+   :widths: 30 30
+   :header-rows: 1
+
+   * - Arrow flags/options
+     - Corresponding environment variables for PyArrow
+   * - ``ARROW_GCS``
+     - ``PYARROW_WITH_GCS``
+   * - ``ARROW_S3``
+     - ``PYARROW_WITH_S3``
+   * - ``ARROW_AZURE``
+     - ``PYARROW_WITH_AZURE``
+   * - ``ARROW_HDFS``
+     - ``PYARROW_WITH_HDFS``
+   * - ``ARROW_CUDA``
+     - ``PYARROW_WITH_CUDA``
+   * - ``ARROW_SUBSTRAIT``
+     - ``PYARROW_WITH_SUBSTRAIT``
+   * - ``ARROW_FLIGHT``
+     - ``PYARROW_WITH_FLIGHT``
+   * - ``ARROW_ACERO``
+     - ``PYARROW_WITH_ACERO``
+   * - ``ARROW_DATASET``
+     - ``PYARROW_WITH_DATASET``
+   * - ``ARROW_PARQUET``
+     - ``PYARROW_WITH_PARQUET``
+   * - ``PARQUET_REQUIRE_ENCRYPTION``
+     - ``PYARROW_WITH_PARQUET_ENCRYPTION``
+   * - ``ARROW_ORC``
+     - ``PYARROW_WITH_ORC``
+   * - ``ARROW_GANDIVA``
+     - ``PYARROW_WITH_GANDIVA``
+
 Deleting stale build artifacts
 ==============================
 

Reply via email to