felipeblazing commented on PR #45093: URL: https://github.com/apache/arrow/pull/45093#issuecomment-2560217189
The idea is that we would like to be able to know ahead of time what bytes are going to be read from a parquet file without necessarily performing the computation associated with decoding the file at the same time. The high level use case for us is that sometimes we don't have the computational resources to perform decompression and decoding but we do have available I/O. We want to be able to continue to perform I/O when interacting with object stores and other slow sources of parquet data without having to commit to decompression / decoding until a later point in time. This would allow us to move bytes from object stores to local memory where it can wait until we have compute resources for decompression / decoding. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
