[GitHub] [iceberg] caseylucas commented on issue #2208: IcebergTableSink to write data into multiple iceberg tables

GitBox Fri, 05 Feb 2021 05:02:17 -0800


caseylucas commented on issue #2208:
URL: https://github.com/apache/iceberg/issues/2208#issuecomment-774019528

I work with Asha, so chiming in (and seeking guidance):

Assumptions:
1. There will N destination Iceberg tables that have *different schemas*.
2. N will grow over time and may be large - probably unreasonable to have a
growing number of Kafka topics.

So the scenario looks like:
```
1. data for all N destinations (packaged up with some discriminator)
--->
2. single Kafka topic
--->
3. Flink job (reading from Kafka source)
--->
4. N Iceberg tables
```

In step 3 & 4, I assume there would need to be some demultiplexing -
selecting the
appropriate Iceberg table dynamically. There are some options for doing
this with
Kafka
(https://stackoverflow.com/questions/45391877/apache-flink-dynamic-number-of-sinks)
-
allowing runtime selection of a destination topic

(https://ci.apache.org/projects/flink/flink-docs-release-1.3/api/java/org/apache/flink/streaming/util/serialization/KeyedSerializationSchema.html),
but the Iceberg Sink doesn't appear to currently support this scenario.

Being new to Iceberg and Flink, I'm wondering if this demultiplexing needs
to be implemented
in the Iceberg Flink sink or maybe there is some built in Flink support or
concept that we're
missing.

If demultiplexing needs to be built into the Flink sink, are there any
issues that should be
considered before we go down the path of implementing such a strategy?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] caseylucas commented on issue #2208: IcebergTableSink to write data into multiple iceberg tables

Reply via email to