Hey Ajantha, Wouldn't this require a major version bump considering this is a breaking change for users depending on iceberg-parquet or iceberg-orc now?
Gabor On Thu, Nov 2, 2023 at 3:01 AM Ajantha Bhat <ajanthab...@gmail.com> wrote: > Hi Everyone, > > At present, Iceberg exclusively utilizes Avro, JSON, and Puffin formats to > handle metadata. Few discussions in the past have explored the possibility > of supporting these existing metadata in Parquet or ORC format. However, > with the addition of partition statistics > <https://github.com/apache/iceberg/blob/main/format/spec.md#partition-statistics-file>, > Iceberg's metadata (stats file) will be > represented in Parquet or ORC formats. > > To enable the `iceberg-core` module to write metadata in Parquet or ORC > format, it will make extensive use of the functions found in the > `iceberg-parquet` > and `iceberg-orc` modules. However, due to a circular dependency issue, > `iceberg-core` cannot directly rely on `iceberg-parquet` and `iceberg-orc`. > Consequently, I suggest merging `iceberg-parquet` and `iceberg-orc` as > packages within the `iceberg-core` module. > > For end users, the main change in the new release package will be the > absence of separate `iceberg-parquet` and `iceberg-orc` JAR files. Instead, > they can > depend on `iceberg-core` (which they were likely doing already). This > change will also be clearly documented in the release notes. > > I would appreciate hearing your thoughts on this proposal. > > For a detailed look at the code changes required to implement the > integration of `iceberg-parquet` into `iceberg-core`, > please refer to the following PR: > https://github.com/apache/iceberg/pull/8500 > > Thanks, > Ajantha >