Repository: arrow Updated Branches: refs/heads/master 76d56d3aa -> 6239abd1a
ARROW-862: [Python] Simplify README landing documentation to direct users and developers toward the documentation Also migrates DEVELOPMENT.md to the Sphinx docs Author: Wes McKinney <[email protected]> Closes #584 from wesm/ARROW-862 and squashes the following commits: 50049dd [Wes McKinney] Revise python/README.md. Move DEVELOPMENT.md to Sphinx docs. Other cleaning 2187c1c [Wes McKinney] Migrate DEVELOPMENT.md to sphinx docs Project: http://git-wip-us.apache.org/repos/asf/arrow/repo Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/6239abd1 Tree: http://git-wip-us.apache.org/repos/asf/arrow/tree/6239abd1 Diff: http://git-wip-us.apache.org/repos/asf/arrow/diff/6239abd1 Branch: refs/heads/master Commit: 6239abd1a61fc254818548a7b6ee3f8a88777a7f Parents: 76d56d3 Author: Wes McKinney <[email protected]> Authored: Mon Apr 24 15:58:19 2017 -0400 Committer: Wes McKinney <[email protected]> Committed: Mon Apr 24 15:58:19 2017 -0400 ---------------------------------------------------------------------- python/DEVELOPMENT.md | 207 ------------------------------- python/README.md | 71 ++--------- python/doc/source/development.rst | 215 +++++++++++++++++++++++++++++++++ python/doc/source/index.rst | 1 + python/doc/source/install.rst | 117 ++---------------- 5 files changed, 236 insertions(+), 375 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/DEVELOPMENT.md ---------------------------------------------------------------------- diff --git a/python/DEVELOPMENT.md b/python/DEVELOPMENT.md deleted file mode 100644 index 7f08169..0000000 --- a/python/DEVELOPMENT.md +++ /dev/null @@ -1,207 +0,0 @@ -<!--- - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. See accompanying LICENSE file. ---> - -## Developer guide for conda users - -### Linux and macOS - -#### System Requirements - -On macOS, any modern XCode (6.4 or higher; the current version is 8.3.1) is -sufficient. - -On Linux, for this guide, we recommend using gcc 4.8 or 4.9, or clang 3.7 or -higher. You can check your version by running - -```shell -$ gcc --version -``` - -On Ubuntu 16.04 and higher, you can obtain gcc 4.9 with: - -```shell -$ sudo apt-get install g++-4.9 -``` - -Finally, set gcc 4.9 as the active compiler using: - -```shell -export CC=gcc-4.9 -export CXX=g++-4.9 -``` - -#### Environment Setup and Build - -First, let's create a conda environment with all the C++ build and Python -dependencies from conda-forge: - -```shell -conda create -y -q -n pyarrow-dev \ - python=3.6 numpy six setuptools cython pandas pytest \ - cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib \ - brotli jemalloc -c conda-forge -source activate pyarrow-dev -``` - -Now, let's clone the Arrow and Parquet git repositories: - -```shell -mkdir repos -cd repos -git clone https://github.com/apache/arrow.git -git clone https://github.com/apache/parquet-cpp.git -``` - -You should now see - -```shell -$ ls -l -total 8 -drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 arrow/ -drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 parquet-cpp/ -``` - -We need to set a number of environment variables to let Arrow's build system -know about our build toolchain: - -``` -export ARROW_BUILD_TYPE=release - -export BOOST_ROOT=$CONDA_PREFIX -export BOOST_LIBRARYDIR=$CONDA_PREFIX/lib - -export FLATBUFFERS_HOME=$CONDA_PREFIX -export RAPIDJSON_HOME=$CONDA_PREFIX -export THRIFT_HOME=$CONDA_PREFIX -export ZLIB_HOME=$CONDA_PREFIX -export SNAPPY_HOME=$CONDA_PREFIX -export BROTLI_HOME=$CONDA_PREFIX -export JEMALLOC_HOME=$CONDA_PREFIX -export ARROW_HOME=$CONDA_PREFIX -export PARQUET_HOME=$CONDA_PREFIX -``` - -Now build and install the Arrow C++ libraries: - -```shell -mkdir arrow/cpp/build -pushd arrow/cpp/build - -cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \ - -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \ - -DARROW_PYTHON=on \ - -DARROW_BUILD_TESTS=OFF \ - .. -make -j4 -make install -popd -``` - -Now build and install the Apache Parquet libraries in your toolchain: - -```shell -mkdir parquet-cpp/build -pushd parquet-cpp/build - -cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \ - -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \ - -DPARQUET_BUILD_BENCHMARKS=off \ - -DPARQUET_BUILD_EXECUTABLES=off \ - -DPARQUET_ZLIB_VENDORED=off \ - -DPARQUET_BUILD_TESTS=off \ - .. - -make -j4 -make install -popd -``` - -Now, build pyarrow: - -```shell -cd arrow/python -python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --inplace -``` - -You should be able to run the unit tests with: - -```shell -$ py.test pyarrow -================================ test session starts ================================ -platform linux -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0 -rootdir: /home/wesm/arrow-clone/python, inifile: -collected 198 items - -pyarrow/tests/test_array.py ........... -pyarrow/tests/test_convert_builtin.py ..................... -pyarrow/tests/test_convert_pandas.py ............................. -pyarrow/tests/test_feather.py .......................... -pyarrow/tests/test_hdfs.py sssssssssssssss -pyarrow/tests/test_io.py .................. -pyarrow/tests/test_ipc.py ........ -pyarrow/tests/test_jemalloc.py ss -pyarrow/tests/test_parquet.py .................... -pyarrow/tests/test_scalars.py .......... -pyarrow/tests/test_schema.py ......... -pyarrow/tests/test_table.py ............. -pyarrow/tests/test_tensor.py ................ - -====================== 181 passed, 17 skipped in 0.98 seconds ======================= -``` - -### Windows - -First, make sure you can [build the C++ library][1]. - -Now, we need to build and install the C++ libraries someplace. - -```shell -mkdir cpp\build -cd cpp\build -set ARROW_HOME=C:\thirdparty -cmake -G "Visual Studio 14 2015 Win64" ^ - -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^ - -DCMAKE_BUILD_TYPE=Release ^ - -DARROW_BUILD_TESTS=off ^ - -DARROW_PYTHON=on .. -cmake --build . --target INSTALL --config Release -cd ..\.. -``` - -After that, we must put the install directory's bin path in our `%PATH%`: - -```shell -set PATH=%ARROW_HOME%\bin;%PATH% -``` - -Now, we can build pyarrow: - -```shell -cd python -python setup.py build_ext --inplace -``` - -#### Running C++ unit tests with Python - -Getting `python-test.exe` to run is a bit tricky because your `%PYTHONPATH%` -must be configured given the active conda environment: - -```shell -set CONDA_ENV=C:\Users\wesm\Miniconda\envs\arrow-test -set PYTHONPATH=%CONDA_ENV%\Lib;%CONDA_ENV%\Lib\site-packages;%CONDA_ENV%\python35.zip;%CONDA_ENV%\DLLs;%CONDA_ENV% -``` - -Now `python-test.exe` or simply `ctest` (to run all tests) should work. - -[1]: https://github.com/apache/arrow/blob/master/cpp/doc/Windows.md \ No newline at end of file http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/README.md ---------------------------------------------------------------------- diff --git a/python/README.md b/python/README.md index ed008ea..816fbf0 100644 --- a/python/README.md +++ b/python/README.md @@ -18,78 +18,31 @@ This library provides a Pythonic API wrapper for the reference Arrow C++ implementation, along with tools for interoperability with pandas, NumPy, and other traditional Python scientific computing packages. -### Development details - -This project is layered in two pieces: - -* arrow_python, a library part of the main Arrow C++ project for Python, - pandas, and NumPy interoperability -* Cython extensions and pure Python code under pyarrow/ which expose Arrow C++ - and pyarrow to pure Python users +## Installing -#### PyArrow Dependencies: - -To build pyarrow, first build and install Arrow C++ with the Python component -enabled using `-DARROW_PYTHON=on`, see -(https://github.com/apache/arrow/blob/master/cpp/README.md) . These components -must be installed either in the default system location (e.g. `/usr/local`) or -in a custom `$ARROW_HOME` location. +Across platforms, you can install a recent version of pyarrow with the conda +package manager: ```shell -mkdir cpp/build -pushd cpp/build -cmake -DARROW_PYTHON=on -DCMAKE_INSTALL_PREFIX=$ARROW_HOME .. -make -j4 -make install -``` - -If you build with a custom `CMAKE_INSTALL_PREFIX`, during development, you must -set `ARROW_HOME` as an environment variable and add it to your -`LD_LIBRARY_PATH` on Linux and OS X: - -```bash -export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ARROW_HOME/lib -``` - -5. **Python dependencies: numpy, pandas, cython, pytest** - -#### Build pyarrow and run the unit tests - -```bash -python setup.py build_ext --inplace -py.test pyarrow -``` - -To change the build type, use the `--build-type` option or set -`$PYARROW_BUILD_TYPE`: - -```bash -python setup.py build_ext --build-type=release --inplace +conda install pyarrow -c conda-forge ``` -To pass through other build options to CMake, set the environment variable -`$PYARROW_CMAKE_OPTIONS`. - -#### Build the pyarrow Parquet file extension +On Linux, you can also install binary wheels from PyPI with pip: -To build the integration with [parquet-cpp][1], pass `--with-parquet` to -the `build_ext` option in setup.py: - -``` -python setup.py build_ext --with-parquet install +```shell +pip install pyarrow ``` -Alternately, add `-DPYARROW_BUILD_PARQUET=on` to the general CMake options. +### Development details -``` -export PYARROW_CMAKE_OPTIONS=-DPYARROW_BUILD_PARQUET=on -``` +See the [Development][2] page in the documentation. -#### Build the documentation +### Building the documentation ```bash pip install -r doc/requirements.txt python setup.py build_sphinx -s doc/source ``` -[1]: https://github.com/apache/parquet-cpp \ No newline at end of file +[1]: https://github.com/apache/parquet-cpp +[2]: https://github.com/apache/arrow/blob/master/python/doc/source/development.rst \ No newline at end of file http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/doc/source/development.rst ---------------------------------------------------------------------- diff --git a/python/doc/source/development.rst b/python/doc/source/development.rst new file mode 100644 index 0000000..01add11 --- /dev/null +++ b/python/doc/source/development.rst @@ -0,0 +1,215 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. currentmodule:: pyarrow +.. _development: + +*********** +Development +*********** + +Developing with conda +===================== + +Linux and macOS +--------------- + +System Requirements +~~~~~~~~~~~~~~~~~~~ + +On macOS, any modern XCode (6.4 or higher; the current version is 8.3.1) is +sufficient. + +On Linux, for this guide, we recommend using gcc 4.8 or 4.9, or clang 3.7 or +higher. You can check your version by running + +.. code-block:: shell + + $ gcc --version + +On Ubuntu 16.04 and higher, you can obtain gcc 4.9 with: + +.. code-block:: shell + + $ sudo apt-get install g++-4.9 + +Finally, set gcc 4.9 as the active compiler using: + +.. code-block:: shell + + export CC=gcc-4.9 + export CXX=g++-4.9 + +Environment Setup and Build +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +First, let's create a conda environment with all the C++ build and Python +dependencies from conda-forge: + +.. code-block:: shell + + conda create -y -q -n pyarrow-dev \ + python=3.6 numpy six setuptools cython pandas pytest \ + cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib \ + brotli jemalloc -c conda-forge + source activate pyarrow-dev + +Now, let's clone the Arrow and Parquet git repositories: + +.. code-block:: shell + + mkdir repos + cd repos + git clone https://github.com/apache/arrow.git + git clone https://github.com/apache/parquet-cpp.git + +You should now see + + +.. code-block:: shell + + $ ls -l + total 8 + drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 arrow/ + drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 parquet-cpp/ + +We need to set some environment variables to let Arrow's build system know +about our build toolchain: + +.. code-block:: shell + + export ARROW_BUILD_TYPE=release + export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX + export PARQUET_BUILD_TOOLCHAIN=$CONDA_PREFIX + +Now build and install the Arrow C++ libraries: + +.. code-block:: shell + + mkdir arrow/cpp/build + pushd arrow/cpp/build + + cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \ + -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \ + -DARROW_PYTHON=on \ + -DARROW_BUILD_TESTS=OFF \ + .. + make -j4 + make install + popd + +Now, optionally build and install the Apache Parquet libraries in your +toolchain: + +.. code-block:: shell + + mkdir parquet-cpp/build + pushd parquet-cpp/build + + cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \ + -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \ + -DPARQUET_BUILD_BENCHMARKS=off \ + -DPARQUET_BUILD_EXECUTABLES=off \ + -DPARQUET_ZLIB_VENDORED=off \ + -DPARQUET_BUILD_TESTS=off \ + .. + + make -j4 + make install + popd + +Now, build pyarrow: + +.. code-block:: shell + + cd arrow/python + python setup.py build_ext --build-type=$ARROW_BUILD_TYPE \ + --with-parquet --with-jemalloc --inplace + +If you did not build parquet-cpp, you can omit ``--with-parquet``. + +You should be able to run the unit tests with: + +.. code-block:: shell + + $ py.test pyarrow + ================================ test session starts ==================== + platform linux -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0 + rootdir: /home/wesm/arrow-clone/python, inifile: + collected 198 items + + pyarrow/tests/test_array.py ........... + pyarrow/tests/test_convert_builtin.py ..................... + pyarrow/tests/test_convert_pandas.py ............................. + pyarrow/tests/test_feather.py .......................... + pyarrow/tests/test_hdfs.py sssssssssssssss + pyarrow/tests/test_io.py .................. + pyarrow/tests/test_ipc.py ........ + pyarrow/tests/test_jemalloc.py ss + pyarrow/tests/test_parquet.py .................... + pyarrow/tests/test_scalars.py .......... + pyarrow/tests/test_schema.py ......... + pyarrow/tests/test_table.py ............. + pyarrow/tests/test_tensor.py ................ + + ====================== 181 passed, 17 skipped in 0.98 seconds =========== + +Windows +======= + +First, make sure you can `build the C++ library <https://github.com/apache/arrow/blob/master/cpp/doc/Windows.md>`_. + +Now, we need to build and install the C++ libraries someplace. + +.. code-block:: shell + + mkdir cpp\build + cd cpp\build + set ARROW_HOME=C:\thirdparty + cmake -G "Visual Studio 14 2015 Win64" ^ + -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^ + -DCMAKE_BUILD_TYPE=Release ^ + -DARROW_BUILD_TESTS=off ^ + -DARROW_PYTHON=on .. + cmake --build . --target INSTALL --config Release + cd ..\.. + +After that, we must put the install directory's bin path in our ``%PATH%``: + +.. code-block:: shell + + set PATH=%ARROW_HOME%\bin;%PATH% + +Now, we can build pyarrow: + +.. code-block:: shell + + cd python + python setup.py build_ext --inplace + +Running C++ unit tests with Python +---------------------------------- + +Getting ``python-test.exe`` to run is a bit tricky because your +``%PYTHONPATH%`` must be configured given the active conda environment: + +.. code-block:: shell + + set CONDA_ENV=C:\Users\wesm\Miniconda\envs\arrow-test + set PYTHONPATH=%CONDA_ENV%\Lib;%CONDA_ENV%\Lib\site-packages;%CONDA_ENV%\python35.zip;%CONDA_ENV%\DLLs;%CONDA_ENV% + +Now ``python-test.exe`` or simply ``ctest`` (to run all tests) should work. http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/doc/source/index.rst ---------------------------------------------------------------------- diff --git a/python/doc/source/index.rst b/python/doc/source/index.rst index ecb8e8f..55b4efc 100644 --- a/python/doc/source/index.rst +++ b/python/doc/source/index.rst @@ -35,6 +35,7 @@ structures. :caption: Getting Started install + development pandas filesystems parquet http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/doc/source/install.rst ---------------------------------------------------------------------- diff --git a/python/doc/source/install.rst b/python/doc/source/install.rst index 278b466..a2a6520 100644 --- a/python/doc/source/install.rst +++ b/python/doc/source/install.rst @@ -37,115 +37,14 @@ Install the latest version from PyPI: pip install pyarrow .. note:: - Currently there are only binary artifcats available for Linux and MacOS. - Otherwise this will only pull the python sources and assumes an existing - installation of the C++ part of Arrow. - To retrieve the binary artifacts, you'll need a recent ``pip`` version that - supports features like the ``manylinux1`` tag. - -Building from source --------------------- - -First, clone the master git repository: - -.. code-block:: bash - - git clone https://github.com/apache/arrow.git arrow - -System requirements -~~~~~~~~~~~~~~~~~~~ - -Building pyarrow requires: - -* A C++11 compiler - - * Linux: gcc >= 4.8 or clang >= 3.5 - * OS X: XCode 6.4 or higher preferred - -* `CMake <https://cmake.org/>`_ - -Python requirements -~~~~~~~~~~~~~~~~~~~ - -You will need Python (CPython) 2.7, 3.4, or 3.5 installed. Earlier releases and -are not being targeted. - -.. note:: - This library targets CPython only due to an emphasis on interoperability with - pandas and NumPy, which are only available for CPython. - -The build requires NumPy, Cython, and a few other Python dependencies: - -.. code-block:: bash - - pip install cython - cd arrow/python - pip install -r requirements.txt - -Installing Arrow C++ library -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -First, you should choose an installation location for Arrow C++. In the future -using the default system install location will work, but for now we are being -explicit: - -.. code-block:: bash - - export ARROW_HOME=$HOME/local - -Now, we build Arrow: - -.. code-block:: bash - - cd arrow/cpp - - mkdir dev-build - cd dev-build - - cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME .. - - make - - # Use sudo here if $ARROW_HOME requires it - make install - -To get the optional Parquet support, you should also build and install -`parquet-cpp <https://github.com/apache/parquet-cpp/blob/master/README.md>`_. -Install `pyarrow` -~~~~~~~~~~~~~~~~~ - - -.. code-block:: bash - - cd arrow/python - - # --with-parquet enables the Apache Parquet support in PyArrow - # --with-jemalloc enables the jemalloc allocator support in PyArrow - # --build-type=release disables debugging information and turns on - # compiler optimizations for native code - python setup.py build_ext --with-parquet --with-jemalloc --build-type=release install - python setup.py install - -.. warning:: - On XCode 6 and prior there are some known OS X `@rpath` issues. If you are - unable to import pyarrow, upgrading XCode may be the solution. - -.. note:: - In development installations, you will also need to set a correct - ``LD_LIBRARY_PATH``. This is most probably done with - ``export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH``. - - -.. code-block:: python + Currently there are only binary artifacts available for Linux and MacOS. + Otherwise this will only pull the python sources and assumes an existing + installation of the C++ part of Arrow. To retrieve the binary artifacts, + you'll need a recent ``pip`` version that supports features like the + ``manylinux1`` tag. - In [1]: import pyarrow +Installing from source +---------------------- - In [2]: pyarrow.array([1,2,3]) - Out[2]: - <pyarrow.array.Int64Array object at 0x7f899f3e60e8> - [ - 1, - 2, - 3 - ] +See :ref:`development`.
