[ 
https://issues.apache.org/jira/browse/HUDI-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davis Zhang reassigned HUDI-8005:
---------------------------------

    Assignee: Davis Zhang

> New lock provider implementation
> --------------------------------
>
>                 Key: HUDI-8005
>                 URL: https://issues.apache.org/jira/browse/HUDI-8005
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Davis Zhang
>            Assignee: Davis Zhang
>            Priority: Major
>
> h2. Estimated effort: 2 days
>  
> *New LP is only DynamoDb based. Zookeeper is beyond the scope here.*
>  
> As of today, LP like dynamoDb generates a per-table LP attribute 
> {{partition-key}} which is used as the name of the lock which readers and 
> writers should grab on the DDB side. Its schema is {{<table name>-<first 8 
> chars of the table uuid>}} to ensure uniqueness and 1 to 1 mapping between 
> the key and the table. The table UUID is purely onehouse specific stuff which 
> is not accessible from hudi writers' context. Hudi writer only have access to 
> HoodieWriterConfig and HoodieTableConfig. This means the partition key is 
> absent from the knowledge of hudi writers initiated by SQL.
>  
> The proposed solution is to change the schema of {{partition-key}} to be 
> {{<table name>-<hash of table base path>}} . Considering table name and table 
> base path can be derived from writer configs by hudi writer, this addresses 
> the issue.
>  
> Properties of partition key: * {*}Uniqueness{*}: The lock key must be unique 
> for each resource you want to lock. This ensures that different resources are 
> independently locked.
>  * {*}Meaningful Naming{*}: Use meaningful names for lock keys to make it 
> clear what resource is being locked. This is particularly useful for 
> debugging and maintenance.
>  * {*}DynamoDB Partition Key Limits{*}: DynamoDB has limits on the size of 
> partition keys. The maximum length for a partition key is 2048 bytes when 
> using UTF-8 encoding. Ensure your lock keys do not exceed this limit
>  ** As of today, hudi does not enforce length on table name. The follow up 
> task is tracked here . It is beyond M1 scope.
>  
> For now, if the newly generated partition key is more than 2048 bytes, *we 
> will simply truncate the table name* to ensure the hash part can fit in.
> {{<table name>-<hash of table base path>}}
>  
> h3. Hash function
> We can use any main stream non-cryptographic hash libraray like murmur, 
> FarmHash.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to