[
https://issues.apache.org/jira/browse/HUDI-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Y Ethan Guo updated HUDI-8139:
------------------------------
Fix Version/s: 1.1.0
> Allow high priority writer to force-acquire lock within a bounded amount of
> time
> --------------------------------------------------------------------------------
>
> Key: HUDI-8139
> URL: https://issues.apache.org/jira/browse/HUDI-8139
> Project: Apache Hudi
> Issue Type: Wish
> Reporter: Krishen Bhan
> Priority: Trivial
> Fix For: 1.1.0
>
>
> *Issue*
> When using HUDI multiwriter with optimistic concurrency mode enabled, a
> writer's operation may need to acquire a table lock before performing some
> steps. If the writer has to wait longer to acquire a lock, then the runtime
> of it's operation will increase or may fail if
> `hoodie.write.lock.wait_time_ms` breached. Although the lock wait time is
> expected to typically be short, there are cases where a writer may need to
> wait for 10-20+ minutes before finally getting the lock:
> # The writer that currently acquired the lock may have encountered a
> service/infra degradation thats causing it to get "stuck" in the middle of a
> transaction. For example, a user using Spark and YARN may encounter a delay
> when awaiting executor resources.
> # The writer that currently acquired the lock may have to process many
> instant files in the base table and metadata table, due to delayed
> clean/archival/compaction
> # If there are many concurrent writers writing to the dataset, each writer
> may have to wait a longer time before finally getting their "chance" to
> acquire the lock.
> Although this issue can be partially mitigated by increasing
> `hoodie.write.lock.wait_time_ms` for a writer, the runtime of the operation
> will still not be reduced. Some users would want to prioritize the
> delay/runtime of a specific writer even at the risk of failing other writers.
> For example, a user may want to give an ingestion writer high priority over
> other writers in order to ensure that the dataset isn't stale. In order to
> satisfy this use case, HUDI can add an enhancement to reduce the amount of
> time "high priority" writers need to wait before acquriing a lock.
> *Resolutions*
> [Approach A]
> One approach could be to add a new timeout config that determines the maximum
> amount of time a writer can acquire a lock before being forced to abort it's
> operation and release it. Special care needs to be given so that all DFS
> operations are aborted before the lock is released
> This approach does not rely on the type of lock provider used. This in turn
> though also means that the changes proposed here will not be isolated to the
> lock provider class and changes to these other parts of HUDI need to be made.
> Another drawback with this approach though is that it might not scale well
> with scenario 3, since the user will have to keep track of the max # of
> writers on the dataset at any given time, and the allowed time window each
> low-priority writer will be allowed to hold the lock for (before aborting)
> will decrease as the number of total low-priority writers on the dataset
> increases
> [Approach B]
> Another approach would be to update ZookeeperLockProvider to allow giving
> giving a priority to a lock acquirer. The idea being that the high priority
> lock acquirer will "skip the line" and be the next job to take the lock. If
> an existing Apache curator recipe for a "priority lock" exists that can be
> used to be update ZookeeperLockProvider, if not a new Zookeeper recipe may
> need to be modified/created using
> [https://zookeeper.apache.org/doc/r3.1.2/recipes.html] as starting point for
> research.
> This approach though may not be sufficient in solving this use case and may
> also require Approach A to be implemented, so that the current low priority
> lock holder will abort within a bounded time window. Also although this
> approach can isolate it's changes to just the Zookeeper lock provider, it
> also means that implementing this change will require delving into Zookeeper
> specific semantics (and porting them to other distributed lock providers
> requested by users). Also this approach will only work on locks where an
> internal logical "queue" is maintained like ZookeeperLockProivder.
> [Approach C]
> Similar to approach B, except instead update the ZookeeperLockProvider to
> allow a high priority writer to forcibly "steal" the lock by forcing the
> current lock holder to fail until the writer finally acquires the lock. At a
> high level, the high priority writer could have another "grace period" timer
> (smaller than `hoodie.write.lock.wait_time_ms`) that dictates the amount of
> time the writer will wait before informing the current lock owner that is
> should self-abort. Based on looking at docs it seems Apache curator might
> already have support for this by having an API for InterProcessMutex (the
> Apache Curator API that ZookeeperLockProvider uses) to make the lock
> revocable
> [https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/locks/InterProcessMutex.html#makeRevocable(org.apache.curator.framework.recipes.locks.RevocationListener)]
> , see
> [https://stackoverflow.com/questions/33374596/apache-curator-lock-recipes-revoking]
> for example. Implementing this needs to be researched further, since like
> mentioned in Approach A, HUDI should ensure that once the lock is released
> there are no ongoing DFS operations initiated from the writer still running.
> One potential implementation approach could be to have the (low priority)
> lock owner, upon getting a revocation request, throw a runtime exception and
> fail without explictly releasing the lock. The idea being that by the time
> Zookeeper nodes cleanup the ephemeral lock node (due to identifying the
> writer's ZK client session as being stale) any ongoing DFS operations would
> have completed. On the other end, the high priority writer will, after
> waiting for the "grace periond" to elapse, repeatedly revoke the lock .
> Similar to approach B, this approach has the drawback of requiring time being
> spent delving into Zookeeper semantics (though likely less since the existing
> curator recipe used by HUDI seems to supports lock revocation, even if in a
> future version of curator) and possibly having to do the work of porting it
> to other lock providers later (and not all distributed lock providers may
> support lock revocation).
> [Suggested approach]
> The suggested approach is C, since although it requires delving into
> Zookeeper semantics more, it resolves scenario 3, making it potentially
> easier to test/implement. Unlike approach B there seems to be more existing
> zookeeper references. Unfortunately similar to A, more work would be needed
> to ensure that lock owner doesn't have an ongoing pending DFS operation after
> the lock is unlocked.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)