tustvold commented on issue #2205: URL: https://github.com/apache/arrow-datafusion/issues/2205#issuecomment-1097930579
I think we all agree on where we would like to end up, however, I worry we are trying to run before we can walk here. I would much prefer an approach that does the simplest thing possible, namely downloads the entire file, and then iteratively add functionality, such as fetching to memory, selective fetching, etc... Currently we have an approach that isn't really very effective at either... > the same way local filesystems do I'm not sure this is a fair comparison, object storage has vastly different performance and billing characteristics from a local filesystem? > Add a "prefetch_hint" certain offsets to the ObjectStore API and make the parquet reader call it Why would you implement this in the ObjectStore API, and not some component generic over object stores. The caching, spilling, logic, etc... is not going to vary based on object store provider? An ObjectStore API that supports fetch requests with an optional byte range should have us covered? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
