Davis Zhang created HUDI-8005:
---------------------------------

             Summary: New lock provider implementation
                 Key: HUDI-8005
                 URL: https://issues.apache.org/jira/browse/HUDI-8005
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Davis Zhang


h2. Estimated effort: 2 days
 
*New LP is only DynamoDb based. Zookeeper is beyond the scope here.*
 
As of today, LP like dynamoDb generates a per-table LP attribute 
{{partition-key}} which is used as the name of the lock which readers and 
writers should grab on the DDB side. Its schema is {{<table name>-<first 8 
chars of the table uuid>}} to ensure uniqueness and 1 to 1 mapping between the 
key and the table. The table UUID is purely onehouse specific stuff which is 
not accessible from hudi writers' context. Hudi writer only have access to 
HoodieWriterConfig and HoodieTableConfig. This means the partition key is 
absent from the knowledge of hudi writers initiated by SQL.
 
The proposed solution is to change the schema of {{partition-key}} to be 
{{<table name>-<hash of table base path>}} . Considering table name and table 
base path can be derived from writer configs by hudi writer, this addresses the 
issue.
 
Properties of partition key: * {*}Uniqueness{*}: The lock key must be unique 
for each resource you want to lock. This ensures that different resources are 
independently locked.
 * {*}Meaningful Naming{*}: Use meaningful names for lock keys to make it clear 
what resource is being locked. This is particularly useful for debugging and 
maintenance.
 * {*}DynamoDB Partition Key Limits{*}: DynamoDB has limits on the size of 
partition keys. The maximum length for a partition key is 2048 bytes when using 
UTF-8 encoding. Ensure your lock keys do not exceed this limit
 ** As of today, hudi does not enforce length on table name. The follow up task 
is tracked here . It is beyond M1 scope.

 
For now, if the newly generated partition key is more than 2048 bytes, *we will 
simply truncate the table name* to ensure the hash part can fit in.
{{<table name>-<hash of table base path>}}
 
h3. Hash function

We can use any main stream non-cryptographic hash libraray like murmur, 
FarmHash.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to