asfimport commented on issue #42256: URL: https://github.com/apache/arrow/issues/42256#issuecomment-2184204263
[Wes McKinney](https://issues.apache.org/jira/browse/PARQUET-458?#comment-16855019) / @wesm: There's multiple issues here preventing the library from reading the pages yet: In DataPageV2 - the encoded rep/def levels prefix is not included in the data, it's part of the page header, so this logic is incorrect: https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_reader.cc#L53 - the compressed data size in the page header refers only to the portion of the page after the definition_levels_num_bytes and repetition_levels_num_bytes from the page header I started working on a patch, I'll see if I can get something up in the next week or so -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
