coolderli commented on pull request #2680: URL: https://github.com/apache/iceberg/pull/2680#issuecomment-1031285338
@rdblue @stevenzwu Hi, what do you think of this PR? In my company, there are some big tables such as TiDB or MySQL binlog that will use flink to load data to the iceberg. For example, we have a TiDB table that has six hundred million records. If we use flink streaming mode, it will cost too much time. If we use batch mode, the executor needs large heap memory to avoid OOM. Any suggestions about this? I think there are two different problems. 1. when we use streaming, we can use rocksdb to avoid OOM when there is a peak flow. 2. when we use spark or flink batch, we can introduce a configure to skip position delete in one transaction, because we can manually de-duplicate in the program or SQL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
