caseylucas commented on issue #2208: URL: https://github.com/apache/iceberg/issues/2208#issuecomment-774019528
I work with Asha, so chiming in (and seeking guidance): Assumptions: 1. There will N destination Iceberg tables that have *different schemas*. 2. N will grow over time and may be large - probably unreasonable to have a growing number of Kafka topics. So the scenario looks like: ``` 1. data for all N destinations (packaged up with some discriminator) ---> 2. single Kafka topic ---> 3. Flink job (reading from Kafka source) ---> 4. N Iceberg tables ``` In step 3 & 4, I assume there would need to be some demultiplexing - selecting the appropriate Iceberg table dynamically. There are some options for doing this with Kafka (https://stackoverflow.com/questions/45391877/apache-flink-dynamic-number-of-sinks) - allowing runtime selection of a destination topic (https://ci.apache.org/projects/flink/flink-docs-release-1.3/api/java/org/apache/flink/streaming/util/serialization/KeyedSerializationSchema.html), but the Iceberg Sink doesn't appear to currently support this scenario. Being new to Iceberg and Flink, I'm wondering if this demultiplexing needs to be implemented in the Iceberg Flink sink or maybe there is some built in Flink support or concept that we're missing. If demultiplexing needs to be built into the Flink sink, are there any issues that should be considered before we go down the path of implementing such a strategy? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
