This is an automated email from the ASF dual-hosted git repository.
jorisvandenbossche pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new 8394571423 ARROW-16582: [Python][Docs] Update Python build docs to
include dataset
8394571423 is described below
commit 8394571423413265f4abc2a2b1e71814e20dfeb0
Author: Raúl Cumplido <[email protected]>
AuthorDate: Thu May 19 10:03:36 2022 +0200
ARROW-16582: [Python][Docs] Update Python build docs to include dataset
This PR aims to update the python developers guide to default building
pyarrow with `DATASET` on. It also fixes a minor command issue as the whole
guide refers to the build directory as `arrow/python`, `arrow/cpp`, `arrow/ci`
instead of `/arrow`.
Closes #13187 from raulcd/ARROW-16582
Authored-by: Raúl Cumplido <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
---
docs/source/developers/python.rst | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/docs/source/developers/python.rst
b/docs/source/developers/python.rst
index c7a899e911..93971931c1 100644
--- a/docs/source/developers/python.rst
+++ b/docs/source/developers/python.rst
@@ -89,6 +89,8 @@ particular group, prepend ``only-`` instead, for example
``--only-parquet``.
The test groups currently include:
+* ``dataset``: Apache Arrow Dataset tests
+* ``flight``: Flight RPC tests
* ``gandiva``: tests for Gandiva expression compiler (uses LLVM)
* ``hdfs``: tests that use libhdfs or libhdfs3 to access the Hadoop filesystem
* ``hypothesis``: tests that use the ``hypothesis`` module for generating
@@ -100,7 +102,6 @@ The test groups currently include:
* ``plasma``: Plasma Object Store tests
* ``s3``: Tests for Amazon S3
* ``tensorflow``: Tests that involve TensorFlow
-* ``flight``: Flight RPC tests
Benchmarking
------------
@@ -264,6 +265,7 @@ created above (stored in ``$ARROW_HOME``):
$ cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_BUILD_TYPE=Debug \
+ -DARROW_DATASET=ON \
-DARROW_WITH_BZ2=ON \
-DARROW_WITH_ZLIB=ON \
-DARROW_WITH_ZSTD=ON \
@@ -283,6 +285,7 @@ There are a number of optional components that can can be
switched ON by
adding flags with ``ON``:
* ``ARROW_CUDA``: Support for CUDA-enabled GPUs
+* ``ARROW_DATASET``: Support for Apache Arrow Dataset
* ``ARROW_FLIGHT``: Flight RPC framework
* ``ARROW_GANDIVA``: LLVM-based expression compiler
* ``ARROW_ORC``: Support for Apache ORC file format
@@ -335,7 +338,7 @@ Python executable which you are using.
For any other C++ build challenges, see :ref:`cpp-development`.
In case you may need to rebuild the C++ part due to errors in the process it is
-advisable to delete the build folder with command ``rm -rf /arrow/cpp/build``.
+advisable to delete the build folder with command ``rm -rf arrow/cpp/build``.
If the build has passed successfully and you need to rebuild due to latest pull
from git master, then this step is not needed.
@@ -345,6 +348,7 @@ Now, build pyarrow:
$ pushd arrow/python
$ export PYARROW_WITH_PARQUET=1
+ $ export PYARROW_WITH_DATASET=1
$ python setup.py build_ext --inplace
$ popd