On Mon, Sep 21, 2009 at 5:02 PM, Iustin Pop <[email protected]> wrote:
> On Mon, Sep 21, 2009 at 04:59:35PM +0100, Guido Trotter wrote:
>> On Mon, Sep 21, 2009 at 4:41 PM, Iustin Pop <[email protected]> wrote:
>>
>> >
>> > Yes, but if we do all in one step, we don't need even the concept of
>> > "rollback".
>> >
>> > Yes, but no in the sense of sleep-with-wakeup and similar. All we need
>> > is a structure mapping lockable items to owners, and:
>> >  self._lock.acquire()
>> >  try:
>> >    if utils.all(has_no_owner(items_to_lock)):
>> >      for item in items_to_lock:
>> >        owner = target
>> >      return True
>> >    else:
>> >      return False
>> >  finally:
>> >    self._lock.release()
>> >
>> > That would be all that is needed if the LockManager can deal with all
>> > the needed data. We don't need notifications.
>> >
>>
>> Yes but no. :) We can do this very easily, and also totally get rid of
>> the SharedLock, and only live with LockSet.
>> But on the other hand that totally loses us the ability of a waiting
>> acquire, which at some point we want, otherwise we starve some jobs,
>> which conceivably will never be able to get all the locks they need,
>> and just always fail.
>>
>> If you want something which can be tried later, but at some point must
>> wait-and-succeed, then that code has some problems (which are solved
>> today, by the internal SharedLock)
>
> I disagree. How many times we have any resource blocked by an infinite
> amount of time? Our problem is not this (starvation because one resource
> is a hot resource), but wrong ordering of execution because of partial
> lock acquire. So unless we have more data, I'd be inclined to say
> 'starvation' is not really an issue (given our usage, not in a
> theoretical locking library).
>

Given our current usage, this cannot be a problem, because the current
locking library doesn't allow it.
It all depends on which model do we want for a job:
1. once submitted should eventually be able to proceed
2. once submitted may or may not be able to proceed, retried a few
times and then discarded
   (we can't keep retrying forever, or we risk just having a huge
queue of jobs waiting).

Think about a big cluster: what is the probability that at least one
node or instance is locked? Will the watcher ever be able to proceed
if it just does try/lock? Or at least often enough?

Thanks,

Guido

Reply via email to