westonpace commented on issue #13030: URL: https://github.com/apache/arrow/issues/13030#issuecomment-1113692280
A parquet file is made up of row groups, columns, and pages. A page is indivisible as it represents a compressed buffer. There is no way to read a part of a page and so it cannot be sliced. However, it is still a popular idea to partition file access based on file size. One way to handle this is to return every row group whose first byte is in the asked-for range. For example, if a parquet file has 10 row groups and each row group is 900,000 bytes and you ask for the range [2000000,3000000] you would get the 3rd row group (that starts at byte 2,700,000). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
