pentschev opened a new pull request, #38375:
URL: https://github.com/apache/arrow/pull/38375

   ### Rationale for this change
   
   In accordance to https://github.com/apache/arrow/issues/38364, we believe 
that for various reasons (shortening import time, preventing unnecessary 
resource consumption and potential bugs with S3 library) it is appropriate to 
avoid initialization of S3 resources at import time and move that step to occur 
at first-use.
   
   ### What changes are included in this PR?
   
   - Remove calls to `ensure_s3_initialized()` that were up until now executed 
during `import pyarrow.fs`;
   - Move `ensure_s3_intialized()` calls to `python/pyarrow/_s3fs.pyx` module;
   - Add global flag to mark whether S3 has been previously initialized and 
`atexit` handlers registered.
   
   ### Are these changes tested?
   
   Yes, existing S3 tests check whether it has been initialized, otherwise 
failing with a C++ exception.
   
   ### Are there any user-facing changes?
   
   No, the behavior is now slightly different with S3 initialization not 
happening immediately after `pyarrow.fs` is imported, but no changes are 
expected from a user perspective relying on the public API alone.
   
   **This PR contains a "Critical Fix".**
   A bug in aws-sdk-cpp reported in 
https://github.com/aws/aws-sdk-cpp/issues/2681 causes segmentation faults under 
specific circumstances when Python processes shutdown, specifically observed 
with Dask+GPUs (so far we were unable to pinpoint the exact correlation of 
Dask+GPUs+S3). While this definitely doesn't seem to affect all users and is 
not directly sourced in Arrow, it may affect use cases that are completely 
independent of S3 to operate, which is particularly problematic in CI where all 
tests pass successfully but the process crashes at shutdown.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to