Ying Lin created HUDI-6188:
------------------------------

             Summary: Unify the logic of intra-partition upsert and 
cross-partition upsert in flink state index.
                 Key: HUDI-6188
                 URL: https://issues.apache.org/jira/browse/HUDI-6188
             Project: Apache Hudi
          Issue Type: Improvement
          Components: index
            Reporter: Ying Lin
            Assignee: Ying Lin


Now when partitioning upsert, according to {{precombine.field}} parameter, keep 
the record with the largest value after upserting.

This is widely used to solve the case of out-of-order data, by setting the 
{{precombine.field}} to the event time to keep records with the largest event 
time.

However, when using the FLINK_STATE index type, if cross-partition occurs, the 
precombine.field parameter will not fully take effect.

In the case of cross-partitioning, the current logic uses data that arrives 
later, even if the event time is smaller.

It may be necessary to unify the logic of intra-partition upsert and 
cross-partition upsert, which is convenient for users to understand and use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to