pentschev commented on code in PR #38375:
URL: https://github.com/apache/arrow/pull/38375#discussion_r1369708715
##########
python/pyarrow/_s3fs.pyx:
##########
@@ -260,6 +271,26 @@ cdef class S3FileSystem(FileSystem):
load_frequency=900, proxy_options=None,
allow_bucket_creation=False, allow_bucket_deletion=False,
retry_strategy: S3RetryStrategy =
AwsStandardS3RetryStrategy(max_attempts=3)):
+ ensure_s3_initialized()
+
+ self._initialize_s3(access_key=access_key, secret_key=secret_key,
session_token=session_token,
+ anonymous=anonymous, region=region,
request_timeout=request_timeout,
+ connect_timeout=connect_timeout, scheme=scheme,
endpoint_override=endpoint_override,
+ background_writes=background_writes,
default_metadata=default_metadata,
+ role_arn=role_arn, session_name=session_name,
external_id=external_id,
+ load_frequency=load_frequency,
proxy_options=proxy_options,
+ allow_bucket_creation=allow_bucket_creation,
allow_bucket_deletion=allow_bucket_deletion,
+ retry_strategy=retry_strategy)
+
+ def _initialize_s3(self, *, access_key=None, secret_key=None,
session_token=None,
+ bint anonymous=False, region=None, request_timeout=None,
+ connect_timeout=None, scheme=None,
endpoint_override=None,
+ bint background_writes=True, default_metadata=None,
+ role_arn=None, session_name=None, external_id=None,
+ load_frequency=900, proxy_options=None,
+ allow_bucket_creation=False,
allow_bucket_deletion=False,
+ retry_strategy: S3RetryStrategy =
AwsStandardS3RetryStrategy(max_attempts=3)):
+
Review Comment:
The reason I had to move the implementation to the `_initialize_s3` method
are the `cdef`s in
https://github.com/apache/arrow/pull/38375/files#diff-afa3ea99a387be221ef1f7230aa309b42001aed318cdc6969e700d5eb04d07b2R294-R296,
they will instantiate objects that expect S3 to be already initialized (which
is only guaranteed after `ensure_s3_initialized()` is called) and Cython will
instantiate them before any code in the function runs, including
`ensure_s3_initialized()`.
Another alternative would be to make them `unique_ptr`s or something that
won't instantiate the objects immediately at `__init__()`'s entry, but I think
this would make the code more complex than the solution currently proposed. If
a pointer is preferred here without deferring to another method I can work on
that.
##########
python/pyarrow/fs.py:
##########
@@ -57,10 +57,6 @@
finalize_s3, initialize_s3, resolve_s3_region)
except ImportError:
_not_imported.append("S3FileSystem")
-else:
- ensure_s3_initialized()
- import atexit
- atexit.register(finalize_s3)
Review Comment:
That sounds like a good idea, let me try that locally and if tests succeed
I'll push the change into this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]