Hi Everyone,

At present, Iceberg exclusively utilizes Avro, JSON, and Puffin formats to
handle metadata. Few discussions in the past have explored the possibility
of supporting these existing metadata in Parquet or ORC format. However,
with the addition of partition statistics
<https://github.com/apache/iceberg/blob/main/format/spec.md#partition-statistics-file>,
Iceberg's metadata (stats file) will be
represented in Parquet or ORC formats.

To enable the `iceberg-core` module to write metadata in Parquet or ORC
format, it will make extensive use of the functions found in the
`iceberg-parquet`
and `iceberg-orc` modules. However, due to a circular dependency issue,
`iceberg-core` cannot directly rely on `iceberg-parquet` and `iceberg-orc`.
Consequently, I suggest merging `iceberg-parquet` and `iceberg-orc` as
packages within the `iceberg-core` module.

For end users, the main change in the new release package will be the
absence of separate `iceberg-parquet` and `iceberg-orc` JAR files. Instead,
they can
depend on `iceberg-core` (which they were likely doing already). This
change will also be clearly documented in the release notes.

I would appreciate hearing your thoughts on this proposal.

For a detailed look at the code changes required to implement the
integration of `iceberg-parquet` into `iceberg-core`,
please refer to the following PR:
https://github.com/apache/iceberg/pull/8500

Thanks,
Ajantha

Reply via email to