[
https://issues.apache.org/jira/browse/HUDI-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Davis Zhang reassigned HUDI-8005:
---------------------------------
Assignee: Davis Zhang
> New lock provider implementation
> --------------------------------
>
> Key: HUDI-8005
> URL: https://issues.apache.org/jira/browse/HUDI-8005
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Davis Zhang
> Assignee: Davis Zhang
> Priority: Major
>
> h2. Estimated effort: 2 days
>
> *New LP is only DynamoDb based. Zookeeper is beyond the scope here.*
>
> As of today, LP like dynamoDb generates a per-table LP attribute
> {{partition-key}} which is used as the name of the lock which readers and
> writers should grab on the DDB side. Its schema is {{<table name>-<first 8
> chars of the table uuid>}} to ensure uniqueness and 1 to 1 mapping between
> the key and the table. The table UUID is purely onehouse specific stuff which
> is not accessible from hudi writers' context. Hudi writer only have access to
> HoodieWriterConfig and HoodieTableConfig. This means the partition key is
> absent from the knowledge of hudi writers initiated by SQL.
>
> The proposed solution is to change the schema of {{partition-key}} to be
> {{<table name>-<hash of table base path>}} . Considering table name and table
> base path can be derived from writer configs by hudi writer, this addresses
> the issue.
>
> Properties of partition key: * {*}Uniqueness{*}: The lock key must be unique
> for each resource you want to lock. This ensures that different resources are
> independently locked.
> * {*}Meaningful Naming{*}: Use meaningful names for lock keys to make it
> clear what resource is being locked. This is particularly useful for
> debugging and maintenance.
> * {*}DynamoDB Partition Key Limits{*}: DynamoDB has limits on the size of
> partition keys. The maximum length for a partition key is 2048 bytes when
> using UTF-8 encoding. Ensure your lock keys do not exceed this limit
> ** As of today, hudi does not enforce length on table name. The follow up
> task is tracked here . It is beyond M1 scope.
>
> For now, if the newly generated partition key is more than 2048 bytes, *we
> will simply truncate the table name* to ensure the hash part can fit in.
> {{<table name>-<hash of table base path>}}
>
> h3. Hash function
> We can use any main stream non-cryptographic hash libraray like murmur,
> FarmHash.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)