[
https://issues.apache.org/jira/browse/CASSANDRA-11258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250952#comment-15250952
]
Paulo Motta commented on CASSANDRA-11258:
-----------------------------------------
Sorry for the delay. See some improvement comments below.
>From the code it seems that when an LWT insert timeouts, the
>{{CasLockFactory}} assumes the lock was not acquired, but maybe the operation
>succeeded and there was a timeout, so we will not be able to re-acquire the
>lock before it expires. So we should perform a read at {{SERIAL}} level in
>this situation to make sure any previous in-progress operations are committed
>and we get the most recent value.
Is the {{sufficientNodesForLocking}} check necessary? I noticed that we are
doing non-LWT reads at {{ONE}}, but we should use {{QUORUM}} instead and that
check will be automatically done when reading or writing.
I think we should adjust our nomenclature and mindset from distributed locks to
expiring leases, since this is what we are doing rather than distributed
locking. If you agree, can you rename classes to reflect that?
When renewing the lease we should also insert the current lease holder priority
into the {{resource_lock_priority}} table, otherwise other nodes might try to
acquire the lease while it's being hold (the operation will fail, but the load
on the system will be higher due to LWT).
We should also probably let lease holders renew leases explicitly rather than
auto-renewing leases at the lease service, so for example the job scheduler can
abort the job if it cannot renew the lease. For that matter, we should probably
extend the {{DistributedLease}} interface with methods to renew the lease
and/or check if it's still valid (perhaps we should have a look at the [JINI
lease spec|https://river.apache.org/doc/specs/html/lease-spec.html] for
inspiration, although it looks a bit verbose).
We should also use {{DateTieredCompactionStrategy}} on the lock tables to
reduce compaction load on these tables, but we can probably do that later since
we wil need to tune it according to ttl.
> Repair scheduling - Resource locking API
> ----------------------------------------
>
> Key: CASSANDRA-11258
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11258
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Marcus Olsson
> Assignee: Marcus Olsson
> Priority: Minor
>
> Create a resource locking API & implementation that is able to lock a
> resource in a specified data center. It should handle priorities to avoid
> node starvation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)