[ 
https://issues.apache.org/jira/browse/ARROW-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612509#comment-17612509
 ] 

Joris Van den Bossche commented on ARROW-17916:
-----------------------------------------------

Dataset can already be disabled (we have a "failure_permitted" for that in 
setup.py, and at least in the past we did have some nightly build that covered 
this). And I suppose HDFS is also already optional?

But it would indeed be good to make more of the others optional as well. 
Compute would probably give the biggest benefit, although also the most 
difficult one? In the cython code this is actually already handled using the 
{{_pc}} object (so that we can call compute functions in lib.pyx without 
importing the module directly). But the PyArrow C++ code also depends on 
Compute for casting (and we depend on that in the numpy/pandas <-> arrow 
conversion, which is currently a part that is not meant to be optional)

> [Python] Allow disabling more components
> ----------------------------------------
>
>                 Key: ARROW-17916
>                 URL: https://issues.apache.org/jira/browse/ARROW-17916
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: Python
>    Affects Versions: 9.0.0
>            Reporter: Antoine Pitrou
>            Priority: Major
>             Fix For: 11.0.0
>
>
> Some users would like to build lightweight versions of PyArrow, for example 
> for use in AWS Lambda or similar systems which constrain the total size of 
> usable libraries.
> However, PyArrow currently mandates some Arrow C++ components which can lead 
> to a very sizable Arrow binary install: Compute, CSV, Dataset, Filesystem, 
> HDFS and JSON.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to