Thanks Raúl for taking care to make this minimally disruptive. This might
be an inconvenience for some users of PyArrow, but I think the benefits
outweigh the inconvenience.

Ian

On Tue, Apr 9, 2024 at 11:17 AM Raúl Cumplido <rau...@apache.org> wrote:

> Hi,
>
> As part of the effort to reduce the footprint of pyarrow
> installations, we have been working on splitting pyarrow into separate
> packages for conda [1]. Each package will pull different C++
> dependencies which will provide different capabilities.
>
> This PR [1] will provide 3 packages for pyarrow:
> pyarrow-core < pyarrow < pyarrow-all
>
> - pyarrow-core: will pull the libarrow.so (~40MB) dependency.
> - pyarrow: in addition to libarrow.so, will also pull libarrow_acero,
> libarrow_dataset, libarrow_substrait and libparquet (~78MB)
> dependencies.
> - pyarrow-all: in addition to everything in pyarrow, will also pull
> libarrow_flight, libarrow_flight_sql and libarrow_gandiva (~97MB).
>
> This means that if you are using conda and installing pyarrow today
> with 16.0.0 you will see a reduction in the C++ dependencies size and
> you will not have access to flight, flight_sql or gandiva. If you want
> to keep using those you will have to install pyarrow-all.
>
> If you want to use a minimal pyarrow version without access to acero,
> dataset, parquet or substrait you can use pyarrow-core and also get a
> reduction in size. Bear in mind that the Arrow team is working on
> moving the filesystems out of libarrow and that will be pulled out of
> pyarrow-core in the future. This means that, probably, on 17.0.0
> parrow-core will not support S3, GCS or Azure Filesystems.
>
> The idea is to keep working on these efforts further to reduce pyarrow
> size.
>
> Thanks everyone,
> Raúl
>
> [1] https://github.com/conda-forge/arrow-cpp-feedstock/pull/1255
>

Reply via email to