Hey Ajantha,

Wouldn't this require a major version bump considering this is a breaking
change for users depending on iceberg-parquet or iceberg-orc now?

Gabor

On Thu, Nov 2, 2023 at 3:01 AM Ajantha Bhat <ajanthab...@gmail.com> wrote:

> Hi Everyone,
>
> At present, Iceberg exclusively utilizes Avro, JSON, and Puffin formats to
> handle metadata. Few discussions in the past have explored the possibility
> of supporting these existing metadata in Parquet or ORC format. However,
> with the addition of partition statistics
> <https://github.com/apache/iceberg/blob/main/format/spec.md#partition-statistics-file>,
> Iceberg's metadata (stats file) will be
> represented in Parquet or ORC formats.
>
> To enable the `iceberg-core` module to write metadata in Parquet or ORC
> format, it will make extensive use of the functions found in the
> `iceberg-parquet`
> and `iceberg-orc` modules. However, due to a circular dependency issue,
> `iceberg-core` cannot directly rely on `iceberg-parquet` and `iceberg-orc`.
> Consequently, I suggest merging `iceberg-parquet` and `iceberg-orc` as
> packages within the `iceberg-core` module.
>
> For end users, the main change in the new release package will be the
> absence of separate `iceberg-parquet` and `iceberg-orc` JAR files. Instead,
> they can
> depend on `iceberg-core` (which they were likely doing already). This
> change will also be clearly documented in the release notes.
>
> I would appreciate hearing your thoughts on this proposal.
>
> For a detailed look at the code changes required to implement the
> integration of `iceberg-parquet` into `iceberg-core`,
> please refer to the following PR:
> https://github.com/apache/iceberg/pull/8500
>
> Thanks,
> Ajantha
>

Reply via email to