tustvold commented on issue #2230: URL: https://github.com/apache/arrow-rs/issues/2230#issuecomment-1200462036
> You just want to keep it simple, because either object stores or file system just require location to get object. Am I right ? Yeah, you most definitely shouldn't **need** the object meta to request a file, although the request preconditions (#2241) would optionally allow restrictions based on a limited subset of the metadata, constrained by what is generally supported. > I think that there should be yet another ObjectStore trait in DataFusion, that should use output of list method as an input to get with some additional flexibility for dynamic attributes. I agree that this should be handled at the DataFusion layer, although I'm not entirely sure what the interface should look like. Personally I would be reticent to overload the ObjectStore name, as I think that would get very confusing, but the broad idea of introducing more flexibility into the IO operators sounds like a good thing imo :+1: One simple option might be to adapt the existing `FormatReader` trait to this purpose or something? :thinking: Tbh what you really want is to be able to override the `AsyncFileReader` that is passed to `ParquetRecordBatchStreamBuilder` by `ParquetOpener`. This feels like it should be imminently achievable without major rework. _It also occurs to me that having separate ParquetExec, JsonExec, is basically redundant now that we have the `FormatReader` abstraction_. If you like I'd be happy to raise an issue on DataFusion proposing something to this effect, and depending on feedback potentially bash it out for you? Also happy to leave this with you, just let me know. I'd very much like to help you get this over the line :smile: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
