[
https://issues.apache.org/jira/browse/CASSANDRA-11258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252011#comment-15252011
]
Marcus Olsson commented on CASSANDRA-11258:
-------------------------------------------
bq. From the code it seems that when an LWT insert timeouts, the CasLockFactory
assumes the lock was not acquired, but maybe the operation succeeded and there
was a timeout, so we will not be able to re-acquire the lock before it expires.
So we should perform a read at SERIAL level in this situation to make sure any
previous in-progress operations are committed and we get the most recent value.
Good catch, I'll add that.
bq. Is the sufficientNodesForLocking check necessary?
It is mostly to avoid trying to do CAS operations that we know will fail,
however that check would be done later down in StorageProxy, so it might be
redundant.
bq. I noticed that we are doing non-LWT reads at ONE, but we should use QUORUM
instead and that check will be automatically done when reading or writing.
I'll change that.
bq. I think we should adjust our nomenclature and mindset from distributed
locks to expiring leases, since this is what we are doing rather than
distributed locking. If you agree, can you rename classes to reflect that?
I agree, leases seems to be a more reasonable term for it.
{quote}
When renewing the lease we should also insert the current lease holder priority
into the resource_lock_priority table, otherwise other nodes might try to
acquire the lease while it's being hold (the operation will fail, but the load
on the system will be higher due to LWT).
We should also probably let lease holders renew leases explicitly rather than
auto-renewing leases at the lease service, so for example the job scheduler can
abort the job if it cannot renew the lease. For that matter, we should probably
extend the DistributedLease interface with methods to renew the lease and/or
check if it's still valid (perhaps we should have a look at the JINI lease spec
for inspiration, although it looks a bit verbose).
{quote}
I've taken a look at the JINI lease spec and I think there are some parts of it
that we wouldn't need, for instance {{setSerialFormat()}} and {{canBatch()}}.
But the interface could perhaps look like this instead:
{code}
interface Lease {
long getExpiration();
void renew(long duration) throws LeaseException;
void cancel(); throws LeaseException;
boolean valid();
}
interface LeaseGrantor { // Or LeaseFactory
Lease newLease(long duration, String resource, int priority, Map<String,
String> metadata); throws LeaseException
}
{code}
I think the {{LeaseMap}}(mentioned in the JINI lease spec) or a similar
interface will be useful for locking multiple data centers. Maybe it's enough
to create some kind of {{LeaseCollection}} that bundles the leases together and
performs renew()/cancel() on all underlying leases?
--
I'll also change the keyspace name to {{system_leases}} and the tables to
{{resource_lease}} and {{resource_lease_priority}}.
> Repair scheduling - Resource locking API
> ----------------------------------------
>
> Key: CASSANDRA-11258
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11258
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Marcus Olsson
> Assignee: Marcus Olsson
> Priority: Minor
>
> Create a resource locking API & implementation that is able to lock a
> resource in a specified data center. It should handle priorities to avoid
> node starvation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)