Re: Discussion on the fixing of the lock issues

Guido Trotter Tue, 22 Sep 2009 06:47:07 -0700

On Tue, Sep 22, 2009 at 2:13 PM, Iustin Pop <[email protected]> wrote:
> On Tue, Sep 22, 2009 at 10:17:49AM +0100, Guido Trotter wrote:
>> On Mon, Sep 21, 2009 at 5:02 PM, Iustin Pop <[email protected]> wrote:
>> > On Mon, Sep 21, 2009 at 04:59:35PM +0100, Guido Trotter wrote:
>> >> On Mon, Sep 21, 2009 at 4:41 PM, Iustin Pop <[email protected]> wrote:
>> >>
>> >> >
>> >> > Yes, but if we do all in one step, we don't need even the concept of
>> >> > "rollback".
>> >> >
>> >> > Yes, but no in the sense of sleep-with-wakeup and similar. All we need
>> >> > is a structure mapping lockable items to owners, and:
>> >> >  self._lock.acquire()
>> >> >  try:
>> >> >    if utils.all(has_no_owner(items_to_lock)):
>> >> >      for item in items_to_lock:
>> >> >        owner = target
>> >> >      return True
>> >> >    else:
>> >> >      return False
>> >> >  finally:
>> >> >    self._lock.release()
>> >> >
>> >> > That would be all that is needed if the LockManager can deal with all
>> >> > the needed data. We don't need notifications.
>> >> >
>> >>
>> >> Yes but no. :) We can do this very easily, and also totally get rid of
>> >> the SharedLock, and only live with LockSet.
>> >> But on the other hand that totally loses us the ability of a waiting
>> >> acquire, which at some point we want, otherwise we starve some jobs,
>> >> which conceivably will never be able to get all the locks they need,
>> >> and just always fail.
>> >>
>> >> If you want something which can be tried later, but at some point must
>> >> wait-and-succeed, then that code has some problems (which are solved
>> >> today, by the internal SharedLock)
>> >
>> > I disagree. How many times we have any resource blocked by an infinite
>> > amount of time? Our problem is not this (starvation because one resource
>> > is a hot resource), but wrong ordering of execution because of partial
>> > lock acquire. So unless we have more data, I'd be inclined to say
>> > 'starvation' is not really an issue (given our usage, not in a
>> > theoretical locking library).
>> >
>>
>> Given our current usage, this cannot be a problem, because the current
>> locking library doesn't allow it.
>> It all depends on which model do we want for a job:
>> 1. once submitted should eventually be able to proceed
>> 2. once submitted may or may not be able to proceed, retried a few
>> times and then discarded
>>    (we can't keep retrying forever, or we risk just having a huge
>> queue of jobs waiting).
>>
>> Think about a big cluster: what is the probability that at least one
>> node or instance is locked? Will the watcher ever be able to proceed
>> if it just does try/lock? Or at least often enough?
>
> If a node or instance is locked or not, is independent of the locking
> model.  It's the actual job usage that determines how much contention is
> there on a single resource.
>


Then your proposal is "let's only implement try-lock for groups of
resources" (at the same level, at least for now). Right?

> The fact that the watcher locks all nodes/instances today is orthogonal
> and will fail on a big cluster independent on how fair is the locking.

No, it won't, if the locking is fair/blocking it will allow it to
eventually succeed, slowing down other jobs (as it does today).

Thanks,

Guido

Re: Discussion on the fixing of the lock issues

Reply via email to