[
https://issues.apache.org/jira/browse/ARROW-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147676#comment-17147676
]
Antoine Pitrou commented on ARROW-6776:
---------------------------------------
The latest PyArrow wheels (*) are much lighter:
{code}
$ du -hs venv-3.7/lib/python3.7/site-packages/pyarrow/
57M venv-3.7/lib/python3.7/site-packages/pyarrow/
{code}
PS: see here for nightly PyArrow wheels:
https://arrow.apache.org/docs/python/install.html#installing-nightly-packages
> [Python] Need a lite version of pyarrow
> ---------------------------------------
>
> Key: ARROW-6776
> URL: https://issues.apache.org/jira/browse/ARROW-6776
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Affects Versions: 0.14.1
> Reporter: Haowei Yu
> Priority: Major
>
> Currently I am building a library packages on top of pyarrow, so I include
> pyarrow as a dependency and ship it to our customer. However, when our
> customer installed our packages, it will also install pyarrow and pyarrow's
> dependency (numpy). However the dependency size is huge.
> {code:bash}
> (py36env) [hyu@c6x64-hyu-newuser-final-clone connector]$ ls -l --block-size=M
> /home/hyu/py36env/lib/python3.6/site-packages/pyarrow/
> total 186M
> {code}
> And numpy is around 80MB. Total is more than 250 MB.
> Our customer want to bundle all dependency and run the code inside AWS
> Lambda, however they hit the size limit and failed to run the code.
> Looking into the pyarrow, I saw multiple .so files are shipped both with and
> without version suffix, I wonder if you can remove the one of them (either
> with or without suffix), it will at least reduce the package size by half.
> Further, our library just want to use IPC and read data as record batch, I
> don't need arrow flight at all (which is the biggest .so file and takes
> around 100 MB). I wonder if you can push a lite version of the pyarrow so
> that I can specify lite version as the dependency. Or maybe I need to build
> my own lite version and push it pypi. However, this approach cause further
> problem if our customer is using the "fat" version of pyarrow unless you the
> change the namespace of lite version of pyarrow.
> Another alternative is that I bundle the pyarrow with our library ( copy the
> whole directory into vendored namespace) and ship it to our customer without
> specifying pyarrow as a dependency. The advantage of this one is that I can
> build pyarrow with whatever option/sub-module/libraries I need. However, I
> tried a lot but failed because pyarrow use absolute import and it will fail
> to import the script in the new location.
> Any insight how I should resolve this issue?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)