ozankabak commented on PR #4525: URL: https://github.com/apache/arrow-datafusion/pull/4525#issuecomment-1339516045
@tustvold, maybe I can provide a little more context: We tried using `ChunkedStore` while testing various approaches to streaming execution (i.e. using non-pipeline-breaking operators all the way). One of the approaches we use while doing this is to monitor memory usage when processing big files (or "infinite" files like FIFOs). In this context, we discovered that ChunkedStore read all the bytes into memory and then split it into chunks. This behavior has two consequences: - One can not use it to test things involving FIFO files, or any unbounded file -- this was how we discovered the issue in the first place. - If one were to use it to ingest a large file, it would not behave reasonably -- this is more of a theoretical issue since its area of intended use is probably elsewhere. All in all, this small change does not change its behavior at all and makes it useable in the above scenarios too. Since it does not make any API change and change chunk contents, it doesn't result in any behavioral changes for already existing use cases. Thanks for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
