hudi-bot opened a new issue, #15389:
URL: https://github.com/apache/hudi/issues/15389

   When running a flink application inserting data to hudi table with async 
compaction enabled, we found that after running for some time, compactions 
become abnormal, which were scheduled, executed succesfully, but not committed. 
And we can observed inconsistence between the messges compact_task sending and 
compact_commit receiving in number, as figure 1 shown below.
   
   By looking into the abnormal InputChannel state of the compact_commit 
operator using tool Arthas, we found the channel is waiting for a `huge` 
message of size 16M, which is far more than the size of normal 
CompactionCommitEvent object. As shown by figure 2.
   
   Now in the method processElement() of class CompactFunction, we use 
collector to send CompactionCommitEvent message asynchronously, but the 
Collector provided by flink seems not to be thread-safe. Can that be the cause 
of the corruption of the message received by compact_commit operator? Shall we 
use the MailboxExecutorAdapter to run collector.collect just like in 
StreamReadOperator?
   
    
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-4717
   - Type: Bug
   - Affects version(s):
     - 0.10.1
   - Attachment(s):
     - 25/Aug/22 07:17;nonggia;figure 
1.png;https://issues.apache.org/jira/secure/attachment/13048572/figure+1.png
     - 25/Aug/22 07:17;nonggia;figure 
2.png;https://issues.apache.org/jira/secure/attachment/13048571/figure+2.png
     - 20/Jan/23 
03:02;teng_huo;issue.png;https://issues.apache.org/jira/secure/attachment/13054683/issue.png
   
   
   ---
   
   
   ## Comments
   
   20/Jan/23 03:05;teng_huo;Hi,
   We got the exactly same issue recently in our Flink MOR pipeline.
   
    !issue.png! 
   
   I have checked Hudi files and all compaction operation were done because 
parquet files are good. I can't understand how it loses events between 
compact_task and compact_commit.
   May I ask if there is anyway to do trouble shooting for this issue? Really 
thanks.;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to