[ 
https://issues.apache.org/jira/browse/ARROW-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026793#comment-17026793
 ] 

Antoine Pitrou commented on ARROW-7728:
---------------------------------------

If you are not doing C++ development with these libraries, you can remove the 
libraries without a version number (such as "libarrow.so", "libgandiva.so"...)

> Duplicated binaries in the python package
> -----------------------------------------
>
>                 Key: ARROW-7728
>                 URL: https://issues.apache.org/jira/browse/ARROW-7728
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.15.1
>            Reporter: Vladimir
>            Priority: Minor
>
> Hello,
>  
> I'm not sure if it is a desired feature or not, but there's no "question" 
> issue type, so I'm opening it as a bug - please correct if necessary.
>  
> Most of binary files in the python "pyarrow" package are present in two 
> versions, e.g.:
>  
> {code:java}
> libarrow.so
> libarrow.so.15
> {code}
> or  
> {code:java}
> libarrow.dylib
> libarrow.15.dylib
> {code}
> (I presume, that ".15" correspond to the version of pyarrow?).
> Which are actually identical:
> {code:java}
> $ diff libarrow.15.dylib libarrow.dylib  # returns nothing
> {code}
> So let me ask:
>  - Is it necessary to have both of them in the distribution?
>  - Which one is actually imported, and is it safe to remove another one?
>  
> Out of 130 MB of full pyarrow, 105 MB are those binaries, so removing 
> duplicates would save quite some space (especially important if using pyarrow 
> in AWS lambdas where the function is limited in size). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to