Davis Zhang created HUDI-8005:
---------------------------------
Summary: New lock provider implementation
Key: HUDI-8005
URL: https://issues.apache.org/jira/browse/HUDI-8005
Project: Apache Hudi
Issue Type: Improvement
Reporter: Davis Zhang
h2. Estimated effort: 2 days
*New LP is only DynamoDb based. Zookeeper is beyond the scope here.*
As of today, LP like dynamoDb generates a per-table LP attribute
{{partition-key}} which is used as the name of the lock which readers and
writers should grab on the DDB side. Its schema is {{<table name>-<first 8
chars of the table uuid>}} to ensure uniqueness and 1 to 1 mapping between the
key and the table. The table UUID is purely onehouse specific stuff which is
not accessible from hudi writers' context. Hudi writer only have access to
HoodieWriterConfig and HoodieTableConfig. This means the partition key is
absent from the knowledge of hudi writers initiated by SQL.
The proposed solution is to change the schema of {{partition-key}} to be
{{<table name>-<hash of table base path>}} . Considering table name and table
base path can be derived from writer configs by hudi writer, this addresses the
issue.
Properties of partition key: * {*}Uniqueness{*}: The lock key must be unique
for each resource you want to lock. This ensures that different resources are
independently locked.
* {*}Meaningful Naming{*}: Use meaningful names for lock keys to make it clear
what resource is being locked. This is particularly useful for debugging and
maintenance.
* {*}DynamoDB Partition Key Limits{*}: DynamoDB has limits on the size of
partition keys. The maximum length for a partition key is 2048 bytes when using
UTF-8 encoding. Ensure your lock keys do not exceed this limit
** As of today, hudi does not enforce length on table name. The follow up task
is tracked here . It is beyond M1 scope.
For now, if the newly generated partition key is more than 2048 bytes, *we will
simply truncate the table name* to ensure the hash part can fit in.
{{<table name>-<hash of table base path>}}
h3. Hash function
We can use any main stream non-cryptographic hash libraray like murmur,
FarmHash.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)