[
https://issues.apache.org/jira/browse/ARROW-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney resolved ARROW-6776.
---------------------------------
Fix Version/s: 1.0.0
Assignee: Wes McKinney
Resolution: Fixed
Yes indeed. I'm closing this as resolved.
> [Python] Need a lite version of pyarrow
> ---------------------------------------
>
> Key: ARROW-6776
> URL: https://issues.apache.org/jira/browse/ARROW-6776
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Affects Versions: 0.14.1
> Reporter: Haowei Yu
> Assignee: Wes McKinney
> Priority: Major
> Fix For: 1.0.0
>
>
> Currently I am building a library packages on top of pyarrow, so I include
> pyarrow as a dependency and ship it to our customer. However, when our
> customer installed our packages, it will also install pyarrow and pyarrow's
> dependency (numpy). However the dependency size is huge.
> {code:bash}
> (py36env) [hyu@c6x64-hyu-newuser-final-clone connector]$ ls -l --block-size=M
> /home/hyu/py36env/lib/python3.6/site-packages/pyarrow/
> total 186M
> {code}
> And numpy is around 80MB. Total is more than 250 MB.
> Our customer want to bundle all dependency and run the code inside AWS
> Lambda, however they hit the size limit and failed to run the code.
> Looking into the pyarrow, I saw multiple .so files are shipped both with and
> without version suffix, I wonder if you can remove the one of them (either
> with or without suffix), it will at least reduce the package size by half.
> Further, our library just want to use IPC and read data as record batch, I
> don't need arrow flight at all (which is the biggest .so file and takes
> around 100 MB). I wonder if you can push a lite version of the pyarrow so
> that I can specify lite version as the dependency. Or maybe I need to build
> my own lite version and push it pypi. However, this approach cause further
> problem if our customer is using the "fat" version of pyarrow unless you the
> change the namespace of lite version of pyarrow.
> Another alternative is that I bundle the pyarrow with our library ( copy the
> whole directory into vendored namespace) and ship it to our customer without
> specifying pyarrow as a dependency. The advantage of this one is that I can
> build pyarrow with whatever option/sub-module/libraries I need. However, I
> tried a lot but failed because pyarrow use absolute import and it will fail
> to import the script in the new location.
> Any insight how I should resolve this issue?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)