lidavidm commented on issue #36765: URL: https://github.com/apache/arrow/issues/36765#issuecomment-1748860688
Yeah, admittedly pre-buffer was a bit of a hack to minimize the changes to the Parquet reader. Ideally you want the Parquet reader to batch its I/O calls (as pre-buffer does) without necessarily caching them. But from what I remember, the reader is not designed that way (selecting columns eventually leads to a lot of disparate I/O calls far down the stack and you'd have to do a bunch of work to untangle that, hence caching was the easiest; that's also why the cache doesn't dump memory when things are done - it's hard from this level to tell when that time is). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
