[ 
https://issues.apache.org/jira/browse/HUDI-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Y Ethan Guo updated HUDI-8422:
------------------------------
    Description: 
[https://github.com/apache/hudi/pull/11943#discussion_r1809697340]

Could we make the ID easier to read, e.g.,
"00000000-0000-0000-0000-000000000000": Avro payload based merging;
"00000000-0000-0000-0000-000000000001": overwrite with latest;
"00000000-0000-0000-0000-000000000002": event time ordering / default hoodie 
payload. Still reserve the existing ID as an alternative ID;
"00000000-0000-0000-0000-000000000003": validate duplicate key record merger;
etc.

Also is it necessary to use UUID format, maybe "0", "1", "2" are shorter IDs?

Another problem is, In the current scheme, it's still hard to avoid collision, 
since there is no rule on how to generate these IDs. A general String with 
namespace and number might be better.

> Improve the format of record merge strategy ID
> ----------------------------------------------
>
>                 Key: HUDI-8422
>                 URL: https://issues.apache.org/jira/browse/HUDI-8422
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Y Ethan Guo
>            Priority: Major
>             Fix For: 1.1.0
>
>
> [https://github.com/apache/hudi/pull/11943#discussion_r1809697340]
> Could we make the ID easier to read, e.g.,
> "00000000-0000-0000-0000-000000000000": Avro payload based merging;
> "00000000-0000-0000-0000-000000000001": overwrite with latest;
> "00000000-0000-0000-0000-000000000002": event time ordering / default hoodie 
> payload. Still reserve the existing ID as an alternative ID;
> "00000000-0000-0000-0000-000000000003": validate duplicate key record merger;
> etc.
> Also is it necessary to use UUID format, maybe "0", "1", "2" are shorter IDs?
> Another problem is, In the current scheme, it's still hard to avoid 
> collision, since there is no rule on how to generate these IDs. A general 
> String with namespace and number might be better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to