Hello All,

I have asked generic questions regarding record key in slack channel, but I 
just want to consolidate everything regarding Record Key and the suggested best 
practices of Record Key construction to get better write performance.

Table Type: COW
Partition Path: Date

My record uniqueness is derived from a combination of 4 fields:

  1.  F1: Datetime (record’s origination datetime)
  2.  F2: String       (11 char  long serial number)
  3.  F3: UUID        (User Identifier)
  4.  F4: String.       (12 CHAR statistic name)

Note: My record is a nested document and some of the above fields are nested 
fields

My Write Use Cases:
1. Writes to partitioned HUDI table every 15 minutes

  1.  where 95% inserts and 5% updates,
  2.  Also 95% write goes to same partition (current date) 5% write can span 
across multiple partitions
2. GDPR request to delete records from the table using User Identifier field 
(F3)


Record Key Construction:
Approach 1:
Generate a UUID  from the concatenated String of all these 4 fields [eg: 
str(F1) + “_” + str(F2) + “_” + str(F3) + “_” + str(F4) ] and use that newly 
generated field as Record Key

Approach 2:
Generate a UUID  from the concatenated String of 3 fields except datetime 
field(F1) [eg: str(F2) + “_” + str(F3) + “_” + str(F4)] and prepend datetime 
field to the generated UUID and use that newly generated field as Record Key 
•F1_<uuid>

Approach 3:
Record Key as a composite key of all 4 fields (F1, F2, F3, F4)

Which is the approach you will suggest? Could you please help me?

Regards,
Felix K Jose










________________________________
The information contained in this message may be confidential and legally 
protected under applicable law. The message is intended solely for the 
addressee(s). If you are not the intended recipient, you are hereby notified 
that any use, forwarding, dissemination, or reproduction of this message is 
strictly prohibited and may be unlawful. If you are not the intended recipient, 
please contact the sender by return e-mail and destroy all copies of the 
original message.

Reply via email to