kazdy created HUDI-5272:
---------------------------
Summary: Align with Flink to support no_precombine in spark
Key: HUDI-5272
URL: https://issues.apache.org/jira/browse/HUDI-5272
Project: Apache Hudi
Issue Type: Improvement
Reporter: kazdy
Assignee: kazdy
Flink supports {{public static final String NO_PRE_COMBINE = "no_precombine";}}
(although not documented) for inserts and updates.
This was Introduced by [#3874|https://github.com/apache/hudi/pull/3874].
https://issues.apache.org/jira/browse/HUDI-2633
{{When the precombine field is not specified, we use the proctime semantics,
that means, the records come later are more fresh}}
There's argument against it, because for updates records cannot be deduplicated
properly. But at the same time Hudi allows us to use non-strict insert mode
that breaks PK uniqueness.
Users can make informed decision and handle duplicates on their own or bring in
their own precombine logic with window functions etc before triggering hudi
write.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)