As Arrow/PyArrow grows more compute functions and features we might move
toward a world where the number of users relying on PyArrow without going
through Pandas or NumPy might grow.

NumPy is a compile time dependency for PyArrow as it's required to compile
the C++ code needed to implement the pandas/numpy integration, but there
has been some discussion regard the fact that we could make NumPy optional
at runtime (remove it from required dependencies in the Python
distribution). You would have to install numpy only if you need to invoke
to_numpy or to_pandas methods or similar integration features. For all the
other use cases, that rely on Arrow alone, you would be able to pip install
pyarrow without involving any other dependency and be ready to go.

Technically it seems a bit complicated, Python/Cython can always work
around missing libraries, but we would have to find ways to deal with lazy
involvement of numpy from C++. I don't know if this is something that was
already discussed in the past and thus someone already has solutions for
this part of the problem, but before investing time and effort in research
I think it made sense to make sure it's a goal that the development team
agrees with.

Reply via email to