liukun4515 commented on PR #1947: URL: https://github.com/apache/arrow-rs/pull/1947#issuecomment-1168179198
> I can't help wondering if this was an oversight in the original parquet specification, not collocating column chunk metadata in the footer, that has since been papered over. All readers I can find simply read the ColumnChunkMetadata from the footer and ignore everything else. I have the same confuse like you about the meatdata. I go through the parquet-mr(Java version) which did't append this metadata in end of each column, and read this metadata from the Filemetadata in the footer. But from the definition of the format https://github.com/apache/parquet-format/blob/54e53e5d7794d383529dd30746378f19a12afd58/src/main/thrift/parquet.thrift#L790, we can know the `file_offset` is required field and the https://github.com/apache/parquet-format/blob/54e53e5d7794d383529dd30746378f19a12afd58/src/main/thrift/parquet.thrift#L796 ColumnMetaData is a optional field. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
