It seems like a good idea to attempt to make this change. The most difficult thing might be projects that use the arrow/python/pyarrow.h C++ API, so we would have to provide a viable migration path for those. turbodbc is one example
https://github.com/blue-yonder/turbodbc/search?l=C%2B%2B&q=pyarrow.h On Mon, Aug 16, 2021 at 6:00 PM Eduardo Ponce <edponc...@gmail.com> wrote: > > I agree with this proposal, the Arrow C++ library does not need to depend > on Python or PyArrow code. > AFAIU this will eliminate the use of -DARROW_PYTHON build flag for Arrow > C++ given that Python-related code will be compiled with PyArrow builds. > Besides the use of "ARROW_PYTHON" env variable in CMakeLists.txt, the > "dbi/hiveserver2" build makes use of "ARROW_PYTHON_SHARED_LINK_LIBS" [1]. > > [1] > https://github.com/apache/arrow/blob/master/cpp/src/arrow/dbi/hiveserver2/CMakeLists.txt#L90 > > ~Eduardo > > On Mon, Aug 16, 2021 at 11:24 AM Antoine Pitrou <anto...@python.org> wrote: > > > > > I definitely think this is desirable. > > > > There's probably going to be a bit of work getting it to pass on all CI > > (including the various nightly builds). > > > > Regards > > > > Antoine. > > > > > > Le 16/08/2021 à 17:08, Alessandro Molina a écrit : > > > PyArrow is currently full Cython codebase, but in reality it relies on > > some > > > classes and functions that are implemented in C++ within the src/python > > > directory ( > > https://github.com/apache/arrow/tree/master/cpp/src/arrow/python > > > ). Especially for numpy/pandas conversion code that has to interface with > > > Numpy arrays data at low level. > > > > > > When working in the area of PyArrow it's not uncommon that you end up > > > jumping back and forth between the Arrow C++ codebase for Python and > > > PyArrow and you can also end up with, sometimes hard to catch, > > integration > > > issues if you forgot to recompile libarrow even if you are working on a > > > Python only change. > > > > > > I'm wondering if it wouldn't make life easier for contributors if the > > > src/arrow/python directory was moved into pyarrow and we made PyArrow > > able > > > to build it. > > > > > > That would probably reduce risk of integration issues as rebuilding > > pyarrow > > > alone would probably be enough for most python specific changes (as it > > > would also rebuild the Python specific C++). > > > > > > I think that moving src/arrow/python into pyarrow would also make the > > > codebase more cohesive which would lower the barrier for new contributors > > > looking for how to fix a pyarrow specific issue. > > > > > > Unless there is any major side effect (outside of having to build a bit > > > more complex build scripts for pyarrow, but it's already CMake based, so > > > building some C++ shouldn't be a big deal) that I'm missing, it seems > > that > > > the benefits of having all Python related code into a single place would > > > surpass the side effects. > > > > > > Also I'm not sure how widespread it is the requirement of Python from > > C++, > > > but it seems to me that if we moved all Python specific code into pyarrow > > > we could make libarrow decoupled from Python. Which might make it easier > > to > > > deal with Virtualenvs or debug versions of python as you wouldn't have to > > > deal with Python3_EXECUTABLE etc when building libarrow. > > > > > > Any thoughts? > > > > >