nonggia.liang created HUDI-4717:
-----------------------------------
Summary: CompactionCommitEvent message corrupted when sent by
compact_task
Key: HUDI-4717
URL: https://issues.apache.org/jira/browse/HUDI-4717
Project: Apache Hudi
Issue Type: Bug
Components: flink, flink-sql
Reporter: nonggia.liang
Attachments: figure 1.png, figure 2.png
When running a flink application inserting data to hudi table with async
compaction enabled, we found that after running for some time, compactions
become abnormal, which were scheduled, executed succesfully, but not committed.
And we can observed inconsistence between the messges compact_task sending and
compact_commit receiving in number, as figure 1 shown below.
By looking into the abnormal InputChannel state of the compact_commit operator
using tool Arthas, we found the channel is waiting for a `huge` message of size
16M, which is far more than the size of normal CompactionCommitEvent object. As
shown by figure 2.
Now in the method processElement() of class CompactFunction, we use collector
to send CompactionCommitEvent message asynchronously, but the Collector
provided by flink seems not to be thread-safe. Can that be the cause of the
corruption of the message received by compact_commit operator? Shall we use the
MailboxExecutorAdapter to run collector.collect just like in StreamReadOperator?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)