coolderli commented on pull request #2680:
URL: https://github.com/apache/iceberg/pull/2680#issuecomment-1031285338


   @rdblue @stevenzwu Hi, what do you think of this PR? In my company, there 
are some big tables such as TiDB or MySQL binlog that will use flink to load 
data to the iceberg. For example, we have a TiDB table that has six hundred 
million records. If we use flink streaming mode, it will cost too much time. If 
we use batch mode, the executor needs large heap memory to avoid OOM. Any 
suggestions about this?
   
   I think there are two different problems.
   1. when we use streaming, we can use rocksdb to avoid OOM when there is a 
peak flow.
   2. when we use spark or flink batch, we can introduce a configure to skip 
position delete in one transaction, because we can manually de-duplicate in the 
program or SQL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to