[
https://issues.apache.org/jira/browse/HUDI-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382961#comment-17382961
]
Dave Hagman edited comment on HUDI-2173 at 7/19/21, 2:42 AM:
-------------------------------------------------------------
I have started to look into the various options we have when locking against
Dynamo. DynamoDB does support [optimistic
locking|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.OptimisticLocking.html].
There is also an AWS labs library, [DynamoDB Lock
Client|https://github.com/awslabs/amazon-dynamodb-lock-client].
The lock client (from AWS labs) has a lot of prebuilt functionality built-in
and according to the README it is used extensively within AWS. What I don't
like is that it appears to be quite stale. The README claims there is a version
*1.2* in Maven but they never actually published it. There are some recently
merged PRs but this doesn't seem to be actively managed. For this reason I
don't think we should consider using it at this time.
The optimistic locking functionality in the native Java SDK looks promising. It
works by using a versioned attribute field. When a client wants to try to
obtain a lock the client must submit their view of the most recent row version
with the acquire request. If the row version submitted by the client is less
than what's in Dynamo it will fail to update (which means we could not lock the
row/partition/etc). The client would then check for the release of the lock (up
until some maximum wait time). When the lock is released the client would again
try to update the row given the client's last known latest version. At that
point the row update (to set the lock) can again succeed or fail depending on
if another process already acquired the lock.
was (Author: dave_hagman):
I have started to look into the various options we have when locking against
Dynamo. DynamoDB does support [optimistic
locking|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.OptimisticLocking.html].
There is also an AWS labs library, [DynamoDB Lock
Client|https://github.com/awslabs/amazon-dynamodb-lock-client].
The lock client has a lot of prebuilt functionality built-in and according to
the README it is used extensively within AWS. What I don't like is that it
appears to be quite stale. The README claims there is a version *1.2* in Maven
but they never actually published it. There are some recently merged PRs but
this doesn't seem to be actively managed. For this reason I don't think we
should consider using it at this time.
The optimistic locking functionality in the native Java SDK looks promising. It
works by using a versioned attribute field. When a client wants to try to
obtain a lock the client must submit their view of the most recent row version
with the acquire request. If the row version submitted by the client is less
than what's in Dynamo it will fail to update (which means we could not lock the
row/partition/etc). The client would then check for the release of the lock (up
until some maximum wait time). When the lock is released the client would again
try to update the row given the client's last known latest version. At that
point the row update (to set the lock) can again succeed or fail depending on
if another process already acquired the lock.
> Implement a DynamoDB based LockProvider
> ---------------------------------------
>
> Key: HUDI-2173
> URL: https://issues.apache.org/jira/browse/HUDI-2173
> Project: Apache Hudi
> Issue Type: Sub-task
> Components: Writer Core
> Reporter: Vinoth Chandar
> Assignee: Dave Hagman
> Priority: Major
> Fix For: 0.10.0
>
>
> Currently, we have ZK and HMS based Lock providers, which can be limited to
> co-ordinating across a single EMR or Hadoop cluster.
> For aws users, DynamoDB is a readily available , fully managed , geo
> replicated datastore, that can actually be used to hold locks, that can now
> span across EMR/hadoop clusters.
> This effort involves supporting a new `DynamoDB` lock provider that
> implements org.apache.hudi.common.lock.LockProvider. We can place the
> implementation itself in hudi-client-common, so it can be used across Spark,
> Flink, Deltastreamer etc.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)