On Tue, Sep 22, 2009 at 2:13 PM, Iustin Pop <[email protected]> wrote: > On Tue, Sep 22, 2009 at 10:17:49AM +0100, Guido Trotter wrote: >> On Mon, Sep 21, 2009 at 5:02 PM, Iustin Pop <[email protected]> wrote: >> > On Mon, Sep 21, 2009 at 04:59:35PM +0100, Guido Trotter wrote: >> >> On Mon, Sep 21, 2009 at 4:41 PM, Iustin Pop <[email protected]> wrote: >> >> >> >> > >> >> > Yes, but if we do all in one step, we don't need even the concept of >> >> > "rollback". >> >> > >> >> > Yes, but no in the sense of sleep-with-wakeup and similar. All we need >> >> > is a structure mapping lockable items to owners, and: >> >> > self._lock.acquire() >> >> > try: >> >> > if utils.all(has_no_owner(items_to_lock)): >> >> > for item in items_to_lock: >> >> > owner = target >> >> > return True >> >> > else: >> >> > return False >> >> > finally: >> >> > self._lock.release() >> >> > >> >> > That would be all that is needed if the LockManager can deal with all >> >> > the needed data. We don't need notifications. >> >> > >> >> >> >> Yes but no. :) We can do this very easily, and also totally get rid of >> >> the SharedLock, and only live with LockSet. >> >> But on the other hand that totally loses us the ability of a waiting >> >> acquire, which at some point we want, otherwise we starve some jobs, >> >> which conceivably will never be able to get all the locks they need, >> >> and just always fail. >> >> >> >> If you want something which can be tried later, but at some point must >> >> wait-and-succeed, then that code has some problems (which are solved >> >> today, by the internal SharedLock) >> > >> > I disagree. How many times we have any resource blocked by an infinite >> > amount of time? Our problem is not this (starvation because one resource >> > is a hot resource), but wrong ordering of execution because of partial >> > lock acquire. So unless we have more data, I'd be inclined to say >> > 'starvation' is not really an issue (given our usage, not in a >> > theoretical locking library). >> > >> >> Given our current usage, this cannot be a problem, because the current >> locking library doesn't allow it. >> It all depends on which model do we want for a job: >> 1. once submitted should eventually be able to proceed >> 2. once submitted may or may not be able to proceed, retried a few >> times and then discarded >> (we can't keep retrying forever, or we risk just having a huge >> queue of jobs waiting). >> >> Think about a big cluster: what is the probability that at least one >> node or instance is locked? Will the watcher ever be able to proceed >> if it just does try/lock? Or at least often enough? > > If a node or instance is locked or not, is independent of the locking > model. It's the actual job usage that determines how much contention is > there on a single resource. >
Then your proposal is "let's only implement try-lock for groups of resources" (at the same level, at least for now). Right? > The fact that the watcher locks all nodes/instances today is orthogonal > and will fail on a big cluster independent on how fair is the locking. No, it won't, if the locking is fair/blocking it will allow it to eventually succeed, slowing down other jobs (as it does today). Thanks, Guido
