> modifying the spec to state that the  ColumnMetaData following
> the chunk data is also optional

+1 on this

> adding language to the effect that if the value of file_offset is 0,
> then no such metadata is present in the file.

What about marking this as deprecated and discouraged to use it?

Best,
Gang


On Tue, Jun 4, 2024 at 1:59 AM Ed Seidl <etse...@live.com> wrote:

> Hi all,
> While investigating a parquet-java issue with the file_offset field in
> ColumnChunk [1] I discovered that it appears parquet java does not (and
> perhaps never did?) write a copy of the ColumnMetaData following the
> column chunk data. This IMO violates the specification[2]. Instead,
> parquet-java seems to exclusively use the "optional" copy in the footer.
> Given that this issue has AFAICT never resulted in compatibility issues
> with other parquet readers, I'm wondering if it's safe to assume no one
> actually uses the mandated copy trailing the chunk data. In that case,
> would it make sense to modify the specification to match the reality on
> the ground? I would propose modifying the spec to state that the
> ColumnMetaData following the chunk data is also optional. Given that the
> file_offset field is required, I'd also propose adding language to the
> effect that if the value of file_offset is 0, then no such metadata is
> present in the file.
>
> Thoughts?
>
> Thanks,
> Ed
>
> [1] https://issues.apache.org/jira/browse/PARQUET-2139
> [2]
> https://github.com/apache/parquet-format?tab=readme-ov-file#file-format
>

Reply via email to