chenfengLiu created HUDI-4350:
---------------------------------

             Summary: reduce the shuffle work when we just insert but not 
update and delete
                 Key: HUDI-4350
                 URL: https://issues.apache.org/jira/browse/HUDI-4350
             Project: Apache Hudi
          Issue Type: Improvement
          Components: flink
            Reporter: chenfengLiu


As the discussion on the https://issues.apache.org/jira/browse/HUDI-4338, more 
shuffle work will cause the network overhead and the risk of the data skew.

So when we build the flink data stream to write to hudi, the orignal plan is 
able to improve this point.

Now if we wanna update or delete record, we need to load index first, then send 
the index record and the hoodie record to Bucket Assgin Operator.

Bucket Assin Opeator will build the index state for assgining the bucket for 
incomming record.

If we just insert the new record not update or delete, we don't need these 
works like buld the index, repartion the existed record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to