[
https://issues.apache.org/jira/browse/ARROW-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladimir updated ARROW-7728:
----------------------------
Description:
Hello,
I'm not sure if it is a desired feature or not, but there's no "question" issue
type, so I'm opening it as a bug - please correct if necessary.
Most of binary files in the python "pyarrow" package are present in two
versions, e.g.:
{code:java}
libarrow.so
libarrow.so.15
{code}
or
{code:java}
libarrow.dylib
libarrow.15.dylib
{code}
(I presume, that ".15" correspond to the version of pyarrow?).
Which are actually identical:
{code:java}
$ diff libarrow.15.dylib libarrow.dylib # returns nothing
{code}
So let me ask:
- Is it necessary to have both of them in the distribution?
- Which one is actually imported, and is it safe to remove another one?
Out of 130 MB of full pyarrow, 105 MB are those binaries, so removing
duplicates would save quite some space (especially important if using pyarrow
in AWS lambdas where the function is limited in size).
was:
Hello,
I'm not sure if it is a desired feature or not, but there's no "question" issue
type, so I'm opening it as a bug - please correct if necessary.
Most of binary files in the python "pyarrow" package are present in two
versions, e.g.:
{code:java}
libarrow.so
libarrow.so.15
{code}
or
{code:java}
libarrow.dylib
libarrow.15.dylib
{code}
(I presume, that ".15" correspond to the version of pyarrow?).
Which are actually identical:
{code:java}
$ diff libarrow.15.dylib libarrow.dylib # returns nothing
{code}
So let me ask:
- Is it necessary to have both of them in the distribution?
- Which one is actually imported, and is it safe to remove another one?
Out of 130 MB of full pyarrow, 105 MB are those binaries, so removing
duplicates would save quite some space (especially important if using pyarrow
in AWS lambdas where the function is limited in size).
> Duplicated binaries in the python package
> -----------------------------------------
>
> Key: ARROW-7728
> URL: https://issues.apache.org/jira/browse/ARROW-7728
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.15.1
> Reporter: Vladimir
> Priority: Minor
>
> Hello,
>
> I'm not sure if it is a desired feature or not, but there's no "question"
> issue type, so I'm opening it as a bug - please correct if necessary.
>
> Most of binary files in the python "pyarrow" package are present in two
> versions, e.g.:
>
> {code:java}
> libarrow.so
> libarrow.so.15
> {code}
> or
> {code:java}
> libarrow.dylib
> libarrow.15.dylib
> {code}
> (I presume, that ".15" correspond to the version of pyarrow?).
> Which are actually identical:
> {code:java}
> $ diff libarrow.15.dylib libarrow.dylib # returns nothing
> {code}
> So let me ask:
> - Is it necessary to have both of them in the distribution?
> - Which one is actually imported, and is it safe to remove another one?
>
> Out of 130 MB of full pyarrow, 105 MB are those binaries, so removing
> duplicates would save quite some space (especially important if using pyarrow
> in AWS lambdas where the function is limited in size).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)