[GitHub] [arrow-datafusion] ozankabak commented on pull request #4525: Avoid reading the entire file in ChunkedStore

GitBox Tue, 06 Dec 2022 07:02:15 -0800


ozankabak commented on PR #4525:
URL: 
https://github.com/apache/arrow-datafusion/pull/4525#issuecomment-1339516045


   @tustvold, maybe I can provide a little more context: We tried using 
`ChunkedStore` while testing various approaches to streaming execution (i.e. 
using non-pipeline-breaking operators all the way).
   
   One of the approaches we use while doing this is to monitor memory usage 
when processing big files (or "infinite" files like FIFOs). In this context, we 
discovered that ChunkedStore read all the bytes into memory and then split it 
into chunks. This behavior has two consequences:
   - One can not use it to test things involving FIFO files, or any unbounded 
file -- this was how we discovered the issue in the first place.
   - If one were to use it to ingest a large file, it would not behave 
reasonably -- this is more of a theoretical issue since its area of intended 
use is probably elsewhere.
   
   All in all, this small change does not change its behavior at all and makes 
it useable in the above scenarios too. Since it does not make any API change 
and change chunk contents, it doesn't result in any behavioral changes for 
already existing use cases.
   
   Thanks for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] ozankabak commented on pull request #4525: Avoid reading the entire file in ChunkedStore

Reply via email to