chenfengLiu created HUDI-4350:
---------------------------------
Summary: reduce the shuffle work when we just insert but not
update and delete
Key: HUDI-4350
URL: https://issues.apache.org/jira/browse/HUDI-4350
Project: Apache Hudi
Issue Type: Improvement
Components: flink
Reporter: chenfengLiu
As the discussion on the https://issues.apache.org/jira/browse/HUDI-4338, more
shuffle work will cause the network overhead and the risk of the data skew.
So when we build the flink data stream to write to hudi, the orignal plan is
able to improve this point.
Now if we wanna update or delete record, we need to load index first, then send
the index record and the hoodie record to Bucket Assgin Operator.
Bucket Assin Opeator will build the index state for assgining the bucket for
incomming record.
If we just insert the new record not update or delete, we don't need these
works like buld the index, repartion the existed record.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)