On 10-06-18 09:37, Sunil Mushran wrote: > On 06/17/2010 07:37 PM, Wengang Wang wrote: > >On 10-06-17 08:06, Sunil Mushran wrote: > >>On 06/15/2010 11:06 PM, Wengang Wang wrote: > >>>still the question. > >>>If you have sent DEREF request to the master, and the lockres became in-use > >>>again, then the lockres remains in the hash table and also in the purge > >>>list. > >>>So > >>>1) If this node is the last ref, there is a possibility that the master > >>>purged the lockres after receiving DEREF request from this node. In this > >>>case, when this node does dlmlock_remote(), the lockres won't be found on > >>>the > >>>master. How to deal with it? > >>> > >>>2) The lockres on this node is going to be purged again, it means it will > >>>send > >>>secondary DEREFs to the master. This is not good I think. > >>> > >>>A thought is setting lockres->owner to DLM_LOCK_RES_OWNER_UNKNOWN after > >>>sending a DEREF request againt this lockres. Also redo master reqeust > >>>before locking on it. > >>The fix we are working towards is to ensure that we set > >>DLM_LOCK_RES_DROPPING_REF once we are determined > >>to purge the lockres. As in, we should not let go of the spinlock > >>before we have either set the flag or decided against purging > >>that resource. > >> > >>Once the flag is set, new users looking up the resource via > >>dlm_get_lock_resource() will notice the flag and will then wait > >>for that flag to be cleared before looking up the lockres hash > >>again. If all goes well, the lockres will not be found (because it > >>has since been unhashed) and it will be forced to go thru the > >>full mastery process. > >That is ideal. > >In many cases the lockres is not got via dlm_get_lock_resource(), but > >via dlm_lookup_lockres()/__dlm_lookup_lockres, which doesn't set the new > >IN-USE state, directly. dlm_lookup_lockres() takes and drops > >dlm->spinlock. And some of caller of __dlm_lookup_lockres() drops the > >spinlock as soon as it got the lockres. Such paths access the lockres > >later after dropping dlm->spinlock and res->spinlock. > >So there is a window that dlm_thread() get a chance to take the > >dlm->spinlock and res->spinlock and set the DROPPING_REF state. > >So whether new users can get the lockres depends on how "new" it is. If > >finds the lockres after DROPPING_REF state is set, sure it works well. But > >if it find it before DROPPING_REF is set, it won't protect the lockres > >from purging since even it "gets" the lockres, the lockres can still in > >unused state. > > dlm_lookup_lockres() and friends just looks up the lockres hash. > dlm_get_lock_resource() also calls it. It inturn is called by dlmlock() > to find and/or create lockres and create a lock on that resource.
Yes you are right. > The other calls to dlm_lookup_lockres() are from handlers and those > handlers can only be tickled if a lock already exists. And if a lock > exits, then we cannot be purging the lockres. > > The one exception is the create_lock handler and that only comes > into play on the lockres master. The inflight ref blocks removal of > such lockres in the window before the lock is created. I think there is another exception, dlm_mig_lockres_handler(). Could you check it in my email(in this thread, to Srini, Message-ID: <20100617110548.ga3...@laptop.us.oracle.com>). > DROPPING_REF is only valid for non-master nodes. As in, only > a non-master node has to send a deref message to the master node. Yes I know. > Confused? Well, I think this needs to be documented. I guess I will > do that after I am done with the global heartbeat business. No, I am clear. Well a document will greatly helpful! regards, wengang. _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel