BsoBird commented on issue #7568:
URL: https://github.com/apache/iceberg/issues/7568#issuecomment-1544261396

   @geonyeongkim 
   Hello. In my experience, performing a rewrite after submitting multiple 
checkpoints in a row does not work well.
   
   This is because the rewrite action is executed synchronously with the stream 
processing. It will cause the stream write to block(wait rewriteAction 
checkpoint success). The more data in the partition, the slower the execution. 
The end result is worse. Therefore, our side in the production environment 
abandoned such a processing scheme.
   
   The ideal way is to start a separate service that monitors the ICEBERG table 
for small files and periodically initiates asynchronous rewrite tasks. HUDI has 
adopted this idea, and HUDI will integrate this function into the DATASTREAM 
internal. But ICEBERG currently does not have built-in integration of similar 
features, which requires some additional development work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to