> modifying the spec to state that the ColumnMetaData following > the chunk data is also optional
+1 on this > adding language to the effect that if the value of file_offset is 0, > then no such metadata is present in the file. What about marking this as deprecated and discouraged to use it? Best, Gang On Tue, Jun 4, 2024 at 1:59 AM Ed Seidl <etse...@live.com> wrote: > Hi all, > While investigating a parquet-java issue with the file_offset field in > ColumnChunk [1] I discovered that it appears parquet java does not (and > perhaps never did?) write a copy of the ColumnMetaData following the > column chunk data. This IMO violates the specification[2]. Instead, > parquet-java seems to exclusively use the "optional" copy in the footer. > Given that this issue has AFAICT never resulted in compatibility issues > with other parquet readers, I'm wondering if it's safe to assume no one > actually uses the mandated copy trailing the chunk data. In that case, > would it make sense to modify the specification to match the reality on > the ground? I would propose modifying the spec to state that the > ColumnMetaData following the chunk data is also optional. Given that the > file_offset field is required, I'd also propose adding language to the > effect that if the value of file_offset is 0, then no such metadata is > present in the file. > > Thoughts? > > Thanks, > Ed > > [1] https://issues.apache.org/jira/browse/PARQUET-2139 > [2] > https://github.com/apache/parquet-format?tab=readme-ov-file#file-format >