crepererum commented on issue #18: URL: https://github.com/apache/arrow-rs-object-store/issues/18#issuecomment-3854039340
It seems that DataFusion may want to reuse file descriptors, see https://github.com/apache/datafusion/issues/19983 . I am wondering if we should evolve the `object_store` API like this: - **`open` method:** Add a new method called `open` that returns an abstract handle (not a concrete file descriptor): For file systems the semantic is pretty clear. For cloud stores I think `open` should fix the etag, so that the consumer knows that the file isn't changed "under their feet". Since reading the etag requires an additional roundtrip to the remote store, I would suggest that we introduce a flag to `open` that controls when the etag is fixed or the file is opened (`eager` = do it during `open`; `lazy` = do it when the first interaction with the handle happens, so you can piggy-back the etag reading) - **soft-remove `get`:** Nuke `get` entirely from the core trait, but keep it was part of the extension trait (implemented as `open` + `read`). I think we could even keep the file handle around by carefully downcasting. @Dandandan what do you think? (tagging your since you've linked the DF ticket in https://github.com/apache/arrow-rs-object-store/pull/628#discussion_r2769144152 ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
