[GitHub] [iceberg] stevenzwu commented on issue #2208: IcebergTableSink to write data into multiple iceberg tables

GitBox Fri, 05 Feb 2021 09:36:20 -0800


stevenzwu commented on issue #2208:
URL: https://github.com/apache/iceberg/issues/2208#issuecomment-774178386



   Yeah, a single Kafka producer/sink supports writing to multiple Kafka topics 
as long as they are all on the same Kafka cluster. However, it is not without 
penalty though, as it will affect data batching and impact disk I/O on the 
broker side. 
   
   It is very expensive for a single Iceberg sink to support growing and large 
number tables. The writers would need to keep many open files. That could lead 
to memory pressure for writer tasks. When it is time to checkpoint and commit, 
the writers need to flush and upload files for hundreds of tables and the 
committer needs to commit hundreds of tables. That would be very slow. I would 
suggest doing the demux before the sink jobs to Iceberg.
   
   Also if you have single Kafka topic holding different and growing number of 
datasets, you also loose the benefit of schema validation when ingesting data 
to Kafka. Having separate Kafka topic and schema validation for each dataset 
may also help with the data quality.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] stevenzwu commented on issue #2208: IcebergTableSink to write data into multiple iceberg tables

Reply via email to