Hi, >> After looking more closely, this is a subtle form of conversion deadlock, >> and this exact case is described in the comment here:
thanks, we withdraw this patch for now since we need to look into more. -- owa -----Original Message----- From: David Teigland [mailto:teigl...@redhat.com] Sent: Thursday, August 10, 2017 3:48 AM To: owa tsutomu(大輪 勤 TMC ○SSDジ□ES技○ES五) Cc: cluster-devel@redhat.com; miyauchi tadashi(宮内 忠志 TOPS (SW開)[基本]) Subject: Re: [Cluster-devel] [PATCH 13/17] dlm: fix _can_be_granted() for lock at the head of covert queue. On Wed, Aug 09, 2017 at 11:41:44AM -0500, David Teigland wrote: > On Wed, Aug 09, 2017 at 05:51:37AM +0000, tsutomu....@toshiba.co.jp wrote: > > If there is a lock resource conflict on multiple nodes, the lock on > > convert queue may not be granted forever. > > > > EX.) > > grant queue: > > node0 grmode NL / rqmode IV > > node1 grmode NL / rqmode IV > > > > convert queue: > > node2 grmode NL / rqmode EX > > node3 grmode PR / rqmode EX > > > > wait queue: > > node4 grmode IV / rqmode PR > > node5 grmode IV / rqmode PR > > > > When the lock conversion (node PR -> NL) of node 0 is completed, the lock > > of node 2 should be grantable. However, __can_be_granted() returns 0 > > because the grmode of the lock on node 3 in convert queue is PR. > > > > When checking the lock at the head of convert queue, exclude > > queue_conflict() targeting convert queue. > > This example doesn't look right. node2's NL->EX cannot be granted because > it conflicts with the PR lock held by node3. (The grmode is still valid > when a lock is on the convert queue.) After looking more closely, this is a subtle form of conversion deadlock, and this exact case is described in the comment here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/dlm/lock.c#n2218 This should be handled by the dlm canceling one of the converting locks (returning it to the grant queue with IV rqmode) and returning -EDEADLK to the application. There is a FIXME in the code highlighting a case you could be hitting: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/dlm/lock.c#n2504 If you are running into that FIXME, you should see these log messages: if (deadlk) { log_print("WARN: pending deadlock %x node %d %s", lkb->lkb_id, lkb->lkb_nodeid, r->res_name); dlm_dump_rsb(r); continue; }