[ 
https://issues.apache.org/jira/browse/HUDI-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-8316:
-----------------------------
    Description: 
The preCombine key is widely used for ordering the Hudi payloads during both 
write and read, previously in Flink, we add a keyword 'no_precombine' as the 
value for scenarios that does not need the ordering(proc_time sequence)

It's valuable if we generalize this use case and support no preCombine key use 
cases, just use the proc_time sequence for the payloads ordering.

The tricky part is previously we have a default value: "ts" for this field, I 
think we can use proc_time sequence when the table schema does not include 
"ts", and use the "ts" for ordering for backward compatibility.

There is also use case that the user wanna to force the proc_time sequence of 
payloads, we can support this use case with a very specific payload: 
ProcTimeAvroPayload.

  was:
The preCombine key is widely used for ordering the Hudi payloads during both 
write and read, previously in Flink, we add a keyword 'no_precombine' as the 
value for scenarios that does not need the ordering(proc_time sequence)

It's valuable if we generalize this use case and support no preCombine key use 
cases, just use the proc_time sequence for the payloads ordering.

The tricky part is previously we have a default value: "ts" for this field, I 
think we can use proc_time sequence when the table schema does not include 
"ts", and use the "ts" for ordering for backward compatibility.

There is also use case that the use wanna to force the proc_time sequence of 
payloads, we can support this use case with a very specific payload: 
ProcTimeAvroPayload.


> No precombine key support for Spark write/read
> ----------------------------------------------
>
>                 Key: HUDI-8316
>                 URL: https://issues.apache.org/jira/browse/HUDI-8316
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: reader-core, writer-core
>            Reporter: Danny Chen
>            Priority: Major
>             Fix For: 1.1.0
>
>
> The preCombine key is widely used for ordering the Hudi payloads during both 
> write and read, previously in Flink, we add a keyword 'no_precombine' as the 
> value for scenarios that does not need the ordering(proc_time sequence)
> It's valuable if we generalize this use case and support no preCombine key 
> use cases, just use the proc_time sequence for the payloads ordering.
> The tricky part is previously we have a default value: "ts" for this field, I 
> think we can use proc_time sequence when the table schema does not include 
> "ts", and use the "ts" for ordering for backward compatibility.
> There is also use case that the user wanna to force the proc_time sequence of 
> payloads, we can support this use case with a very specific payload: 
> ProcTimeAvroPayload.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to