Re: Dynamic Flink Iceberg Sink

Maximilian Michels Wed, 13 Nov 2024 04:03:45 -0800

Thanks for taking the lead on this, Peter! The Dynamic Iceberg Sink is
designed to address several challenges with the current Flink Iceberg sink.


It offers three main benefits:

   1. *Flexibility in Table Writing*: It allows writing to multiple tables,
   eliminating the 1:1 sink-to-topic restriction.
   2. *Dynamic Table Creation and Updates*: Tables can be created and
   updated automatically based on user-defined routing logic.
   3. *Schema and Partitioning Updates*: It supports dynamic updates to
   table schemas and partitioning according to user-provided specifications.

In my eyes, these new features are huge and will further drive Iceberg
adoption for Flink users.

Cheers,
Max

On Mon, Nov 11, 2024 at 6:32 PM Péter Váry <peter.vary.apa...@gmail.com>
wrote:

> Hi Team,
>
> With Max Michels, we started to work on enhancing the current Iceberg Sink
> to allow inserting evolving records into a changing table.
> See:
> https://docs.google.com/document/d/1R3NZmi65S4lwnmNjH4gLCuXZbgvZV5GNrQKJ5NYdO9s
> Created the project to follow the lifecycle of the proposal:
> https://github.com/orgs/apache/projects/429
>
> From the abstract:
> ---------
> Flink Iceberg connector sink is the tool to write data to an Iceberg table
> from a continuous Flink stream. The current Sink implementations emphasize
> throughput over flexibility. The main limiting factor is that the Iceberg
> Flink Sink requires static table structure. The table, the schema, the
> partitioning specification need to be constant. If one of the previous
> things changes the Flink Job needs to be restarted. This allows using
> optimal record serialization and good performance, but real life use-cases
> need to work around this limitation when the underlying table has changed.
> We need to provide a tool to accommodate these changes.
> [..]
> The following typical use cases are considered during this design:
> - Incoming Avro records schema changes (new columns are added, or other
> backward compatible changes happen). The Flink job is expected to update
> the table schema dynamically, and continue to ingest data with the new and
> the old schema without a job restart.
> - Incoming records define the target Iceberg table dynamically. The Flink
> job is expected to create the new table(s) and continue writing to them
> without a job restart.
> - The partitioning schema of the table changes. The Flink job is expected
> to update the specification and continue writing to the target table
> without a job restart.
> ---------
>
> If you have any questions, ideas, suggestions please let us know here, or
> in comments on the document.
>
> Thanks,
> Peter
>

Re: Dynamic Flink Iceberg Sink

Reply via email to