hudi-bot opened a new issue, #15389: URL: https://github.com/apache/hudi/issues/15389
When running a flink application inserting data to hudi table with async compaction enabled, we found that after running for some time, compactions become abnormal, which were scheduled, executed succesfully, but not committed. And we can observed inconsistence between the messges compact_task sending and compact_commit receiving in number, as figure 1 shown below. By looking into the abnormal InputChannel state of the compact_commit operator using tool Arthas, we found the channel is waiting for a `huge` message of size 16M, which is far more than the size of normal CompactionCommitEvent object. As shown by figure 2. Now in the method processElement() of class CompactFunction, we use collector to send CompactionCommitEvent message asynchronously, but the Collector provided by flink seems not to be thread-safe. Can that be the cause of the corruption of the message received by compact_commit operator? Shall we use the MailboxExecutorAdapter to run collector.collect just like in StreamReadOperator? ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-4717 - Type: Bug - Affects version(s): - 0.10.1 - Attachment(s): - 25/Aug/22 07:17;nonggia;figure 1.png;https://issues.apache.org/jira/secure/attachment/13048572/figure+1.png - 25/Aug/22 07:17;nonggia;figure 2.png;https://issues.apache.org/jira/secure/attachment/13048571/figure+2.png - 20/Jan/23 03:02;teng_huo;issue.png;https://issues.apache.org/jira/secure/attachment/13054683/issue.png --- ## Comments 20/Jan/23 03:05;teng_huo;Hi, We got the exactly same issue recently in our Flink MOR pipeline. !issue.png! I have checked Hudi files and all compaction operation were done because parquet files are good. I can't understand how it loses events between compact_task and compact_commit. May I ask if there is anyway to do trouble shooting for this issue? Really thanks.;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
