raulcd commented on PR #50044: URL: https://github.com/apache/arrow/pull/50044#issuecomment-4657686981
Thanks @kou for the review, I really appreciate you taking the time! > How many S3 specific options do we have? I counted around 13 that we probably don't want to express as part of the URI (credentials, assume-role params, custom providers, connect/request timeouts, a few behavior flags, default_metadata, retry_strategy, sse_customer_key). > If we have only few options, how about implementing them in this PR for easy to review with real use case? If we have many options, how about implementing only a few options in this PR and implementing others in a follow-up PR for easy to review? I can do that, I basically started a follow up PR on top of this one where I started working towards adding them: - https://github.com/apache/arrow/pull/50128 I wanted to have this in isolation for easier initial review but I am happy to push a single PR or add some of those cases here. > BTW, do we need to use `std::any`? Can we reuse existing `arrow::KeyValueMetadata`? The `std::any` case is what @pitrou proposed on the [mailing list](https://lists.apache.org/thread/41b0bofpt7j3f90g0r6xyq3jcj346mqs) and basically was to be able to cater for S3Options that are not string-encodable like retry_strategy with `std::shared_ptr<S3RetryStrategy>` , or default_metadata `std::shared_ptr<const KeyValueMetadata>`. There's also credentials_provider `std::shared_ptr<Aws::Auth::AWSCredentialsProvider>` . The cost is the type-erasure across the shared-library boundary, which I am trying to validate it won't be an issue for complex types on the other PR I pointed above. I've only tested `std::string` on Linux so far as I just started working on it. I should validate it with complex types and not only on Linux but also on other OS, as I've read cases where this might not behave as expected across DLL boundaries. But now that I think more about the added complexity, I am unsure it is worth the hassle because: - `credentials_provider` won't be able to be built from libarrow without AWS headers (needs the SDK) - `default_metadata` is string-encodable (it's a <str:str>). - `S3RetryStrategy` is the only thing `std::any` buys us: letting a libarrow-only caller pass a custom `S3RetryStrategy` subclass, but only if we keep that interface in libarrow which I am unsure we will as this should be part of the s3fs module. We could come up with string encodable `retry_strategy=standard`, `retry_max_attempts=5` @pitrou @kou thoughts about that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
