[ 
https://issues.apache.org/jira/browse/ARROW-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8518:
----------------------------------
    Labels: pull-request-available  (was: )

> [Python] Create tools to enable optional components (like Gandiva, Flight) to 
> be built and deployed as separate Python packages
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-8518
>                 URL: https://issues.apache.org/jira/browse/ARROW-8518
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Packaging, Python
>            Reporter: Wes McKinney
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Our current monolithic approach to Python packaging isn't likely to be 
> sustainable long-term.
> At a high level, I would propose a structure like this:
> {code}
> pip install pyarrow  # core package containing libarrow, libarrow_python, and 
> any other common bundled C++ library dependencies
> pip install pyarrow-flight  # installs pyarrow, pyarrow_flight
> pip install pyarrow-gandiva # installs pyarrow, pyarrow_gandiva
> {code}
> We can maintain the semantic appearance of a single {{pyarrow}} package by 
> having thin API modules that would look like
> {code}
> CONTENTS OF pyarrow/flight.py
> from pyarrow_flight import *
> {code}
> Obviously, this is more difficult to build and package:
> * CMake and setup.py files must be refactored a bit so that we can reuse code 
> between the parent and child packages
> * Separate conda and wheel packages must be produced. With conda this seems 
> more straightforward but since the child wheels depend on the parent core 
> wheel, the build process seems more complicated
> In any case, I don't think these challenges are insurmountable. This will 
> have several benefits:
> * Smaller installation footprint for simple use cases (though note we are 
> STILL duplicating shared libraries in the wheels, which is quite bad)
> * Less developer anxiety about expanding the scope of what Python code is 
> shipped from apache/arrow. If in 5 years we are shipping 5 different Python 
> wheels with each Apache Arrow release, that sounds completely fine to me. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to