raulcd commented on PR #50044:
URL: https://github.com/apache/arrow/pull/50044#issuecomment-4657686981

   Thanks @kou for the review, I really appreciate you taking the time!
   
   > How many S3 specific options do we have? 
   
   I counted around 13 that we probably don't want to express as part of the 
URI (credentials, assume-role params, custom providers, connect/request 
timeouts, a few behavior flags, default_metadata, retry_strategy, 
sse_customer_key).
   
   > If we have only few options, how about implementing them in this PR for 
easy to review with real use case? If we have many options, how about 
implementing only a few options in this PR and implementing others in a 
follow-up PR for easy to review?
   
   I can do that, I basically started a follow up PR on top of this one where I 
started working towards adding them:
   - https://github.com/apache/arrow/pull/50128
   
   I wanted to have this in isolation for easier initial review but I am happy 
to push a single PR or add some of those cases here.
   
   > BTW, do we need to use `std::any`? Can we reuse existing 
`arrow::KeyValueMetadata`?
   
   The `std::any` case is what @pitrou  proposed on the [mailing 
list](https://lists.apache.org/thread/41b0bofpt7j3f90g0r6xyq3jcj346mqs) and 
basically was to be able to cater for S3Options that are not string-encodable 
like retry_strategy with `std::shared_ptr<S3RetryStrategy>` , or 
default_metadata `std::shared_ptr<const 
   KeyValueMetadata>`. 
   
   There's also credentials_provider  
`std::shared_ptr<Aws::Auth::AWSCredentialsProvider>` .
   
   The cost is the type-erasure across the shared-library boundary, which I am 
trying to validate it won't be an issue for complex types on the other PR I 
pointed above. I've only tested `std::string` on Linux so far as I just started 
working on it. I should validate it with complex types and not only on Linux 
but also on other OS, as I've read cases where this might not behave as 
expected across DLL boundaries.
   
   But now that I think more about the added complexity, I am unsure it is 
worth the hassle because:
   - `credentials_provider` won't be able to be built from libarrow without AWS 
headers (needs the SDK)
   - `default_metadata` is string-encodable (it's a <str:str>).
   - `S3RetryStrategy` is the only thing `std::any` buys us: letting a 
libarrow-only caller pass a custom `S3RetryStrategy` subclass, but only if we 
keep that interface in libarrow which I am unsure we will as this should be 
part of the s3fs module. We could come up with string encodable 
`retry_strategy=standard`,  `retry_max_attempts=5`
   
   @pitrou @kou thoughts about that?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to