Hi Everyone, At present, Iceberg exclusively utilizes Avro, JSON, and Puffin formats to handle metadata. Few discussions in the past have explored the possibility of supporting these existing metadata in Parquet or ORC format. However, with the addition of partition statistics <https://github.com/apache/iceberg/blob/main/format/spec.md#partition-statistics-file>, Iceberg's metadata (stats file) will be represented in Parquet or ORC formats.
To enable the `iceberg-core` module to write metadata in Parquet or ORC format, it will make extensive use of the functions found in the `iceberg-parquet` and `iceberg-orc` modules. However, due to a circular dependency issue, `iceberg-core` cannot directly rely on `iceberg-parquet` and `iceberg-orc`. Consequently, I suggest merging `iceberg-parquet` and `iceberg-orc` as packages within the `iceberg-core` module. For end users, the main change in the new release package will be the absence of separate `iceberg-parquet` and `iceberg-orc` JAR files. Instead, they can depend on `iceberg-core` (which they were likely doing already). This change will also be clearly documented in the release notes. I would appreciate hearing your thoughts on this proposal. For a detailed look at the code changes required to implement the integration of `iceberg-parquet` into `iceberg-core`, please refer to the following PR: https://github.com/apache/iceberg/pull/8500 Thanks, Ajantha