BsoBird commented on issue #7568: URL: https://github.com/apache/iceberg/issues/7568#issuecomment-1544261396
@geonyeongkim Hello. In my experience, performing a rewrite after submitting multiple checkpoints in a row does not work well. This is because the rewrite action is executed synchronously with the stream processing. It will cause the stream write to block(wait rewriteAction checkpoint success). The more data in the partition, the slower the execution. The end result is worse. Therefore, our side in the production environment abandoned such a processing scheme. The ideal way is to start a separate service that monitors the ICEBERG table for small files and periodically initiates asynchronous rewrite tasks. HUDI has adopted this idea, and HUDI will integrate this function into the DATASTREAM internal. But ICEBERG currently does not have built-in integration of similar features, which requires some additional development work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
