ggershinsky commented on pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-916116455
> it seems there is also a bug in parquet-cpp which causes incorrect file offset to be written, see https://issues.apache.org/jira/browse/SPARK-36696, so we'll want to make sure the solution here work for that case as well. Yep, it does. I've taken the file that was posted at that jira, and read it with Spark with p1.12.0 - this indeed fails. After adding this fix to parquet, the reading worked ok. This happens because for regular files (and most of encrypted files), this fix ignores the `RowGroup.offset` field, and reverts the offset compute to the pre-1.12 behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
