[ 
https://issues.apache.org/jira/browse/CASSANDRA-11258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250952#comment-15250952
 ] 

Paulo Motta commented on CASSANDRA-11258:
-----------------------------------------

Sorry for the delay. See some improvement comments below.

>From the code it seems that when an LWT insert timeouts, the 
>{{CasLockFactory}} assumes the lock was not acquired, but maybe the operation 
>succeeded and there was a timeout, so we will not be able to re-acquire the 
>lock before it expires. So we should perform a read at {{SERIAL}} level in 
>this situation to make sure any previous in-progress operations are committed 
>and we get the most recent value.

Is the {{sufficientNodesForLocking}} check necessary? I noticed that we are 
doing non-LWT reads at {{ONE}}, but we should use {{QUORUM}} instead and that 
check will be automatically done when reading or writing.

I think we should adjust our nomenclature and mindset from distributed locks to 
expiring leases, since this is what we are doing rather than distributed 
locking. If you agree, can you rename classes to reflect that?

When renewing the lease we should also insert the current lease holder priority 
into the {{resource_lock_priority}} table, otherwise other nodes might try to 
acquire the lease while it's being hold (the operation will fail, but the load 
on the system will be higher due to LWT).

We should also probably let lease holders renew leases explicitly rather than 
auto-renewing leases at the lease service, so for example the job scheduler can 
abort the job if it cannot renew the lease. For that matter, we should probably 
extend the {{DistributedLease}} interface with methods to renew the lease 
and/or check if it's still valid (perhaps we should have a look at the [JINI 
lease spec|https://river.apache.org/doc/specs/html/lease-spec.html] for 
inspiration, although it looks a bit verbose).

We should also use {{DateTieredCompactionStrategy}} on the lock tables to 
reduce compaction load on these tables, but we can probably do that later since 
we wil need to tune it according to ttl.

> Repair scheduling - Resource locking API
> ----------------------------------------
>
>                 Key: CASSANDRA-11258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11258
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Marcus Olsson
>            Assignee: Marcus Olsson
>            Priority: Minor
>
> Create a resource locking API & implementation that is able to lock a 
> resource in a specified data center. It should handle priorities to avoid 
> node starvation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to