Hey Micah,

For some reason, your email ended up in my spam box 😨

There is a reason for everything!

.gz.metadata.json is quite uncommon and can't be read by most existing
> tools. Would it be better to support .metadata.json.gz and treat
> .gz.metadata.json as legacy for backward compatibility?


The Java client supports both
<https://github.com/apache/iceberg/blob/dc26b72ad016840b79d62bf8a84b7f2109e9b71b/core/src/test/java/org/apache/iceberg/TableMetadataParserCodecTest.java#L29-L40>.
I looked into this years ago, and if I recall correctly, it was to bypass
the decompressor of Hadoop <https://github.com/apache/iceberg/pull/258/>.
Hadoop would detect the .gz and handle all the (de)compression, which we
wanted to do ourselves.

gzip is becoming increasingly outdated due to its lack of support for
> modern CPUs. New algorithms like zstd are gaining popularity, so should
> we consider allowing users to use .metadata.json.zst as well?


Yes, I think that would make a lot of sense.

Kind regards,
Fokko




Op ma 28 apr 2025 om 08:41 schreef Xuanwo <xua...@apache.org>:

> I've copied my comments from GitHub here for a broader discussion:
>
>
>
> Hi, I have two concerns about this change:
>
>    - .gz.metadata.json is quite uncommon and can't be read by most
>    existing tools. Would it be better to support .metadata.json.gz and
>    treat .gz.metadata.json as legacy for backward compatibility?
>    - gzip is becoming increasingly outdated due to its lack of support
>    for modern CPUs. New algorithms like zstd are gaining popularity, so
>    should we consider allowing users to use .metadata.json.zst as well?
>
>
> On Sun, Apr 27, 2025, at 07:36, Micah Kornfield wrote:
>
> I created https://github.com/apache/iceberg/pull/12598 to document this
> feature.  Kevin Liu already took a look, but I would like to get more eyes
> on it before starting a vote for merging.
>
> Thanks,
> Micah
>
> Xuanwo
>
> https://xuanwo.io/
>
>

Reply via email to