asfimport commented on issue #42256:
URL: https://github.com/apache/arrow/issues/42256#issuecomment-2184204263

   [Wes 
McKinney](https://issues.apache.org/jira/browse/PARQUET-458?#comment-16855019) 
/ @wesm:
   There's multiple issues here preventing the library from reading the pages 
yet:
   
   In DataPageV2
   
   - the encoded rep/def levels prefix is not included in the data, it's part 
of the page header, so this logic is incorrect: 
https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_reader.cc#L53
   - the compressed data size in the page header refers only to the portion of 
the page after the definition_levels_num_bytes and repetition_levels_num_bytes 
from the page header
     
     I started working on a patch, I'll see if I can get something up in the 
next week or so


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to