stevenzwu commented on issue #2208: URL: https://github.com/apache/iceberg/issues/2208#issuecomment-774178386
Yeah, a single Kafka producer/sink supports writing to multiple Kafka topics as long as they are all on the same Kafka cluster. However, it is not without penalty though, as it will affect data batching and impact disk I/O on the broker side. It is very expensive for a single Iceberg sink to support growing and large number tables. The writers would need to keep many open files. That could lead to memory pressure for writer tasks. When it is time to checkpoint and commit, the writers need to flush and upload files for hundreds of tables and the committer needs to commit hundreds of tables. That would be very slow. I would suggest doing the demux before the sink jobs to Iceberg. Also if you have single Kafka topic holding different and growing number of datasets, you also loose the benefit of schema validation when ingesting data to Kafka. Having separate Kafka topic and schema validation for each dataset may also help with the data quality. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
