zeevm commented on pull request #1154: URL: https://github.com/apache/arrow-rs/pull/1154#issuecomment-1030627167
I see a few issues with this. First, the notion that the column chunk is the basic i/o unit for Parquet is somewhat outdates with the introduction of the index page. Second, a major premise of Parquet is "read only what you need", where what you need is usually dictated by some query engine, so continuously downloading in the background for data the client may doesn't even want or need doesn't seem right, especially as the cost is complicating all existing client by the added "Send" constraint. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
