On Mon, Sep 21, 2009 at 5:02 PM, Iustin Pop <[email protected]> wrote: > On Mon, Sep 21, 2009 at 04:59:35PM +0100, Guido Trotter wrote: >> On Mon, Sep 21, 2009 at 4:41 PM, Iustin Pop <[email protected]> wrote: >> >> > >> > Yes, but if we do all in one step, we don't need even the concept of >> > "rollback". >> > >> > Yes, but no in the sense of sleep-with-wakeup and similar. All we need >> > is a structure mapping lockable items to owners, and: >> > self._lock.acquire() >> > try: >> > if utils.all(has_no_owner(items_to_lock)): >> > for item in items_to_lock: >> > owner = target >> > return True >> > else: >> > return False >> > finally: >> > self._lock.release() >> > >> > That would be all that is needed if the LockManager can deal with all >> > the needed data. We don't need notifications. >> > >> >> Yes but no. :) We can do this very easily, and also totally get rid of >> the SharedLock, and only live with LockSet. >> But on the other hand that totally loses us the ability of a waiting >> acquire, which at some point we want, otherwise we starve some jobs, >> which conceivably will never be able to get all the locks they need, >> and just always fail. >> >> If you want something which can be tried later, but at some point must >> wait-and-succeed, then that code has some problems (which are solved >> today, by the internal SharedLock) > > I disagree. How many times we have any resource blocked by an infinite > amount of time? Our problem is not this (starvation because one resource > is a hot resource), but wrong ordering of execution because of partial > lock acquire. So unless we have more data, I'd be inclined to say > 'starvation' is not really an issue (given our usage, not in a > theoretical locking library). >
Given our current usage, this cannot be a problem, because the current locking library doesn't allow it. It all depends on which model do we want for a job: 1. once submitted should eventually be able to proceed 2. once submitted may or may not be able to proceed, retried a few times and then discarded (we can't keep retrying forever, or we risk just having a huge queue of jobs waiting). Think about a big cluster: what is the probability that at least one node or instance is locked? Will the watcher ever be able to proceed if it just does try/lock? Or at least often enough? Thanks, Guido
