kazdy created HUDI-5272:
---------------------------

             Summary: Align with Flink to support no_precombine in spark
                 Key: HUDI-5272
                 URL: https://issues.apache.org/jira/browse/HUDI-5272
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: kazdy
            Assignee: kazdy


Flink supports {{public static final String NO_PRE_COMBINE = "no_precombine";}} 
(although not documented) for inserts and updates.

This was Introduced by [#3874|https://github.com/apache/hudi/pull/3874].
https://issues.apache.org/jira/browse/HUDI-2633

{{When the precombine field is not specified, we use the proctime semantics, 
that means, the records come later are more fresh}}

There's argument against it, because for updates records cannot be deduplicated 
properly. But at the same time Hudi allows us to use non-strict insert mode 
that breaks PK uniqueness.
Users can make informed decision and handle duplicates on their own or bring in 
their own precombine logic with window functions etc before triggering hudi 
write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to