raulcd commented on PR #50195:
URL: https://github.com/apache/arrow/pull/50195#issuecomment-4777622120

   > I think that we should use `LoadFileSystemFactories()` for bindings to 
avoid loading the S3 module for users who don't need S3. For example, PyArrow 
users who also want to use the S3 module, they will install `pyarrow_s3` (or 
something)` explicitly.
   
   With conda this isn't necessary, we already ship all the `.so` as different 
packages allowing users to pick and choose what to install. This won't change, 
if a user installs `pyarrow-core` and installs `libarrow-s3` will have S3 
capabilities, same as if we install today `pyarrow-core` with 
`libarrow-flight`. If `libarrow-s3` is not installed PyArrow would just 
`ImportError` when not finding the corresponding `DLL`. Basically we build with 
all capabilities turned on but install only the necessary .so and if not found 
they `ImportError` . I'll validate pyarrow fails with ImportError if the .so 
isn't present but this should behave as the other modules.
   
   With wheels this is another different beast and I have to explore a little 
further. A related issue:
   - https://github.com/apache/arrow/issues/38536
   
   The original problem we had with wheels is that there's no mechanism to 
share dependencies between wheels. Auditwheel/delvewheel/delocate mangle the 
.so name to avoid other wheels clashing with other dependencies symbols. The 
problem is that `libarrow_s3.so` requires `libarrow` symbols and it's not clear 
how this `pyarrow_s3` would be shipped. Should it include its own `libarrow` 
using the mangled symbols for the new wheel? Should it use the `libarrow` 
library coming from the main `pyarrow` wheel? What happens with different 
versions of `pyarrow` and `pyarrow_s3` installed?
   
   As a note, I've just validated we don't mangle libarrow (or any of our .so) 
on the wheels. I am going to start exploring this a little further to see if I 
can come up with something even though I am still unclear about some of the 
questions above, like version matching to avoid ABI problems.
   
   Related: @amol- who worked on `consolidatewheels` in the past:
   - https://github.com/amol-/consolidatewheels
   
   And some Python PEP attempts to define some external dependencies for wheels 
are on discussion:
   
https://discuss.python.org/t/pep-725-specifying-external-dependencies-in-pyproject-toml-round-2/103890
   
   What I am saying is that using `LoadFileSystemFactories()` isn't solving the 
real problem which in my opinion is: how do we share a single libarrow between 
several extra wheels and coordinate versioning?
   
   cc @h-vetinari who knows this space and might shed some light


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to