Thanks for taking the lead on this, Peter! The Dynamic Iceberg Sink is designed to address several challenges with the current Flink Iceberg sink.
It offers three main benefits: 1. *Flexibility in Table Writing*: It allows writing to multiple tables, eliminating the 1:1 sink-to-topic restriction. 2. *Dynamic Table Creation and Updates*: Tables can be created and updated automatically based on user-defined routing logic. 3. *Schema and Partitioning Updates*: It supports dynamic updates to table schemas and partitioning according to user-provided specifications. In my eyes, these new features are huge and will further drive Iceberg adoption for Flink users. Cheers, Max On Mon, Nov 11, 2024 at 6:32 PM Péter Váry <peter.vary.apa...@gmail.com> wrote: > Hi Team, > > With Max Michels, we started to work on enhancing the current Iceberg Sink > to allow inserting evolving records into a changing table. > See: > https://docs.google.com/document/d/1R3NZmi65S4lwnmNjH4gLCuXZbgvZV5GNrQKJ5NYdO9s > Created the project to follow the lifecycle of the proposal: > https://github.com/orgs/apache/projects/429 > > From the abstract: > --------- > Flink Iceberg connector sink is the tool to write data to an Iceberg table > from a continuous Flink stream. The current Sink implementations emphasize > throughput over flexibility. The main limiting factor is that the Iceberg > Flink Sink requires static table structure. The table, the schema, the > partitioning specification need to be constant. If one of the previous > things changes the Flink Job needs to be restarted. This allows using > optimal record serialization and good performance, but real life use-cases > need to work around this limitation when the underlying table has changed. > We need to provide a tool to accommodate these changes. > [..] > The following typical use cases are considered during this design: > - Incoming Avro records schema changes (new columns are added, or other > backward compatible changes happen). The Flink job is expected to update > the table schema dynamically, and continue to ingest data with the new and > the old schema without a job restart. > - Incoming records define the target Iceberg table dynamically. The Flink > job is expected to create the new table(s) and continue writing to them > without a job restart. > - The partitioning schema of the table changes. The Flink job is expected > to update the specification and continue writing to the target table > without a job restart. > --------- > > If you have any questions, ideas, suggestions please let us know here, or > in comments on the document. > > Thanks, > Peter >