[GitHub] [arrow-datafusion] tustvold commented on pull request #4525: Avoid reading the entire file in ChunkedStore

GitBox Tue, 06 Dec 2022 07:22:04 -0800


tustvold commented on PR #4525:
URL: 
https://github.com/apache/arrow-datafusion/pull/4525#issuecomment-1339544973


   Thank you for taking the time to respond, a couple of follow up questions to 
help my understanding. I was actually about to file a PR to gate `ChunkedStore` 
behind `cfg(test)` so would like to better understand you use-case as I'm 
clearly missing something?
   
   > In this context, we discovered that ChunkedStore read all the bytes into 
memory and then split it into chunks.
   
   What was the motivation for using ChunkedStore over just using the standard 
`LocalFileSystem`? This would return `GetResult::File` which would then 
synchronously read data from the file in batches automatically?
   
   > If one were to use it to ingest a large file, it would not behave 
reasonably
   
   Agreed, the ChunkedStore will only ever make the experience of the wrapped 
ObjectStore worse. I wrote it to make them controllably worse in order to test 
edge cases in the file format readers, it is actively worse to use ChunkedStore 
than to not. Or at least that is what I had thought, and why I am now asking 
these questions? :sweat_smile: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] tustvold commented on pull request #4525: Avoid reading the entire file in ChunkedStore

Reply via email to