[GitHub] [arrow] westonpace commented on issue #13030: [JAVA] Is any way reading partial parquet file into arrow

GitBox Fri, 29 Apr 2022 13:21:34 -0700


westonpace commented on issue #13030:
URL: https://github.com/apache/arrow/issues/13030#issuecomment-1113692280


   A parquet file is made up of row groups, columns, and pages.  A page is 
indivisible as it represents a compressed buffer.  There is no way to read a 
part of a page and so it cannot be sliced.
   
   However, it is still a popular idea to partition file access based on file 
size.  One way to handle this is to return every row group whose first byte is 
in the asked-for range.
   
   For example, if a parquet file has 10 row groups and each row group is 
900,000 bytes and you ask for the range [2000000,3000000] you would get the 3rd 
row group (that starts at byte 2,700,000).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on issue #13030: [JAVA] Is any way reading partial parquet file into arrow

Reply via email to