[
https://issues.apache.org/jira/browse/ARROW-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-8518:
----------------------------------
Labels: pull-request-available (was: )
> [Python] Create tools to enable optional components (like Gandiva, Flight) to
> be built and deployed as separate Python packages
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-8518
> URL: https://issues.apache.org/jira/browse/ARROW-8518
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Packaging, Python
> Reporter: Wes McKinney
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.0.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Our current monolithic approach to Python packaging isn't likely to be
> sustainable long-term.
> At a high level, I would propose a structure like this:
> {code}
> pip install pyarrow # core package containing libarrow, libarrow_python, and
> any other common bundled C++ library dependencies
> pip install pyarrow-flight # installs pyarrow, pyarrow_flight
> pip install pyarrow-gandiva # installs pyarrow, pyarrow_gandiva
> {code}
> We can maintain the semantic appearance of a single {{pyarrow}} package by
> having thin API modules that would look like
> {code}
> CONTENTS OF pyarrow/flight.py
> from pyarrow_flight import *
> {code}
> Obviously, this is more difficult to build and package:
> * CMake and setup.py files must be refactored a bit so that we can reuse code
> between the parent and child packages
> * Separate conda and wheel packages must be produced. With conda this seems
> more straightforward but since the child wheels depend on the parent core
> wheel, the build process seems more complicated
> In any case, I don't think these challenges are insurmountable. This will
> have several benefits:
> * Smaller installation footprint for simple use cases (though note we are
> STILL duplicating shared libraries in the wheels, which is quite bad)
> * Less developer anxiety about expanding the scope of what Python code is
> shipped from apache/arrow. If in 5 years we are shipping 5 different Python
> wheels with each Apache Arrow release, that sounds completely fine to me.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)