[
https://issues.apache.org/jira/browse/ARROW-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612509#comment-17612509
]
Joris Van den Bossche commented on ARROW-17916:
-----------------------------------------------
Dataset can already be disabled (we have a "failure_permitted" for that in
setup.py, and at least in the past we did have some nightly build that covered
this). And I suppose HDFS is also already optional?
But it would indeed be good to make more of the others optional as well.
Compute would probably give the biggest benefit, although also the most
difficult one? In the cython code this is actually already handled using the
{{_pc}} object (so that we can call compute functions in lib.pyx without
importing the module directly). But the PyArrow C++ code also depends on
Compute for casting (and we depend on that in the numpy/pandas <-> arrow
conversion, which is currently a part that is not meant to be optional)
> [Python] Allow disabling more components
> ----------------------------------------
>
> Key: ARROW-17916
> URL: https://issues.apache.org/jira/browse/ARROW-17916
> Project: Apache Arrow
> Issue Type: Wish
> Components: Python
> Affects Versions: 9.0.0
> Reporter: Antoine Pitrou
> Priority: Major
> Fix For: 11.0.0
>
>
> Some users would like to build lightweight versions of PyArrow, for example
> for use in AWS Lambda or similar systems which constrain the total size of
> usable libraries.
> However, PyArrow currently mandates some Arrow C++ components which can lead
> to a very sizable Arrow binary install: Compute, CSV, Dataset, Filesystem,
> HDFS and JSON.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)