It seems like a good idea to attempt to make this change. The most
difficult thing might be projects that use the arrow/python/pyarrow.h
C++ API, so we would have to provide a viable migration path for
those. turbodbc is one example

https://github.com/blue-yonder/turbodbc/search?l=C%2B%2B&q=pyarrow.h

On Mon, Aug 16, 2021 at 6:00 PM Eduardo Ponce <edponc...@gmail.com> wrote:
>
> I agree with this proposal, the Arrow C++ library does not need to depend
> on Python or PyArrow code.
> AFAIU this will eliminate the use of -DARROW_PYTHON build flag for Arrow
> C++ given that Python-related code will be compiled with PyArrow builds.
> Besides the use of "ARROW_PYTHON" env variable in CMakeLists.txt, the
> "dbi/hiveserver2" build makes use of "ARROW_PYTHON_SHARED_LINK_LIBS" [1].
>
> [1]
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/dbi/hiveserver2/CMakeLists.txt#L90
>
> ~Eduardo
>
> On Mon, Aug 16, 2021 at 11:24 AM Antoine Pitrou <anto...@python.org> wrote:
>
> >
> > I definitely think this is desirable.
> >
> > There's probably going to be a bit of work getting it to pass on all CI
> > (including the various nightly builds).
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 16/08/2021 à 17:08, Alessandro Molina a écrit :
> > > PyArrow is currently full Cython codebase, but in reality it relies on
> > some
> > > classes and functions that are implemented in C++ within the src/python
> > > directory (
> > https://github.com/apache/arrow/tree/master/cpp/src/arrow/python
> > > ). Especially for numpy/pandas conversion code that has to interface with
> > > Numpy arrays data at low level.
> > >
> > > When working in the area of PyArrow it's not uncommon that you end up
> > > jumping back and forth between the Arrow C++ codebase for Python and
> > > PyArrow and you can also end up with, sometimes hard to catch,
> > integration
> > > issues if you forgot to recompile libarrow even if you are working on a
> > > Python only change.
> > >
> > > I'm wondering if it wouldn't make life easier for contributors if the
> > > src/arrow/python directory was moved into pyarrow and we made PyArrow
> > able
> > > to build it.
> > >
> > > That would probably reduce risk of integration issues as rebuilding
> > pyarrow
> > > alone would probably be enough for most python specific changes (as it
> > > would also rebuild the Python specific C++).
> > >
> > > I think that moving src/arrow/python into pyarrow would also make the
> > > codebase more cohesive which would lower the barrier for new contributors
> > > looking for how to fix a pyarrow specific issue.
> > >
> > > Unless there is any major side effect (outside of having to build a bit
> > > more complex build scripts for pyarrow, but it's already CMake based, so
> > > building some C++ shouldn't be a big deal) that I'm missing, it seems
> > that
> > > the benefits of having all Python related code into a single place would
> > > surpass the side effects.
> > >
> > > Also I'm not sure how widespread it is the requirement of Python from
> > C++,
> > > but it seems to me that if we moved all Python specific code into pyarrow
> > > we could make libarrow decoupled from Python. Which might make it easier
> > to
> > > deal with Virtualenvs or debug versions of python as you wouldn't have to
> > > deal with Python3_EXECUTABLE etc when building libarrow.
> > >
> > > Any thoughts?
> > >
> >

Reply via email to