raulcd commented on PR #50195: URL: https://github.com/apache/arrow/pull/50195#issuecomment-4777622120
> I think that we should use `LoadFileSystemFactories()` for bindings to avoid loading the S3 module for users who don't need S3. For example, PyArrow users who also want to use the S3 module, they will install `pyarrow_s3` (or something)` explicitly. With conda this isn't necessary, we already ship all the `.so` as different packages allowing users to pick and choose what to install. This won't change, if a user installs `pyarrow-core` and installs `libarrow-s3` will have S3 capabilities, same as if we install today `pyarrow-core` with `libarrow-flight`. If `libarrow-s3` is not installed PyArrow would just `ImportError` when not finding the corresponding `DLL`. Basically we build with all capabilities turned on but install only the necessary .so and if not found they `ImportError` . I'll validate pyarrow fails with ImportError if the .so isn't present but this should behave as the other modules. With wheels this is another different beast and I have to explore a little further. A related issue: - https://github.com/apache/arrow/issues/38536 The original problem we had with wheels is that there's no mechanism to share dependencies between wheels. Auditwheel/delvewheel/delocate mangle the .so name to avoid other wheels clashing with other dependencies symbols. The problem is that `libarrow_s3.so` requires `libarrow` symbols and it's not clear how this `pyarrow_s3` would be shipped. Should it include its own `libarrow` using the mangled symbols for the new wheel? Should it use the `libarrow` library coming from the main `pyarrow` wheel? What happens with different versions of `pyarrow` and `pyarrow_s3` installed? As a note, I've just validated we don't mangle libarrow (or any of our .so) on the wheels. I am going to start exploring this a little further to see if I can come up with something even though I am still unclear about some of the questions above, like version matching to avoid ABI problems. Related: @amol- who worked on `consolidatewheels` in the past: - https://github.com/amol-/consolidatewheels And some Python PEP attempts to define some external dependencies for wheels are on discussion: https://discuss.python.org/t/pep-725-specifying-external-dependencies-in-pyproject-toml-round-2/103890 What I am saying is that using `LoadFileSystemFactories()` isn't solving the real problem which in my opinion is: how do we share a single libarrow between several extra wheels and coordinate versioning? cc @h-vetinari who knows this space and might shed some light -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
