Neal Richardson created ARROW-5222:
--------------------------------------
Summary: [Python] Issues with installing pyarrow for development
on MacOS
Key: ARROW-5222
URL: https://issues.apache.org/jira/browse/ARROW-5222
Project: Apache Arrow
Issue Type: Improvement
Components: Documentation, Python
Reporter: Neal Richardson
Fix For: 0.14.0
I tried following the
[instructions|https://github.com/apache/arrow/blob/master/docs/source/developers/python.rst]
for installing pyarrow for developers on macos, and I ran into quite a bit of
difficulty. I'm hoping we can improve our documentation and/or tooling to make
this a smoother process.
I know we can't anticipate every quirk of everyone's dev environment, but in my
case, I was getting set up on a new machine, so this was from a clean slate.
I'm also new to contributing to the project, so I'm a "clean slate" in that
regard too, so my ignorance may be exposing other assumptions in the docs.
# The instructions recommend using conda, but as this [Stack Overflow
question|https://stackoverflow.com/questions/55798166/cmake-fails-with-when-attempting-to-compile-simple-test-program]
notes, cmake fails. Uwe helpfully suggested installing an older MacOS SDK from
[here|https://github.com/phracker/MacOSX-SDKs/releases]. That may work, but I'm
personally wary to install binaries from an unofficial github account, let
alone record that in our docs as an official recommendation. Either way, we
should update the docs either to note this necessity or to recommend against
installing with conda on macos.
# After that, I tried to go the Homebrew path. Ultimately this did succeed,
but it was rough. It seemed that I had to `brew install` a lot of packages that
weren't included in the arrow/python/Brewfile (i.e. try to cmake, see what
missing dependency it failed on, `brew install` it, retry `cmake`, and repeat).
Among the libs I installed this way were double-conversion snappy brotli
protobuf gtest rapidjson flatbuffers lz4 zstd c-ares boost. It's not clear how
many of these extra dependencies I had to install were because I'd only
installed the xcode command-line tools and not the full xcode from the App
Store; regardless, the Brewfile should be complete if we want to use it.
# In searching Jira for the double-conversion issue (the first one I hit), I
found [this issue/PR|https://github.com/apache/arrow/pull/4132/files], which
added double-conversion to a different Brewfile, in c_glib. So I tried `brew
bundle` installing that Brewfile. It would probably be good to have a common
Brewfile for the C++ setup, which the python and glib ones could load and then
add any other extra dependencies, if necessary. That way, there's one place to
add common dependencies.
# I got close here but still had issues with `BOOST_HOME` not being found,
even though I had brew-installed it. From the console output, it appeared that
even though I was not using conda and did not have an active conda environment
(I'd even done `conda env remove --name pyarrow-dev`), the cmake configuration
script detected that conda existed and decided to use conda to resolve
dependencies. I tried setting lots of different environment variables to tell
cmake not to use conda, but ultimately I was only able to get past this by
deleting conda from my system entirely.
# This let me get to the point of being able to `import pyarrow`. But then
running tests failed because the `hypothesis` package was not installed. I see
that it is included in requirements-test.txt and setup.py under tests_require,
but I followed the installation instructions and this package did not end up in
my virtualenv. `pip install hypothesis` resolved it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)