Before this patch dlm would skip the recover_slot phase of recovery if it still had a valid comm connection to the failed node. However, gfs2 still needs to perform journal replay, otherwise we run the risk of journal replay that happens at reboot time overwriting metadata we've since modified after we release the locks.
Signed-off-by: Bob Peterson <rpete...@redhat.com> --- fs/dlm/member.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/fs/dlm/member.c b/fs/dlm/member.c index 0bc43b35d2c5..155bd52eb018 100644 --- a/fs/dlm/member.c +++ b/fs/dlm/member.c @@ -463,17 +463,12 @@ static void dlm_lsop_recover_slot(struct dlm_ls *ls, struct dlm_member *memb) if (!ls->ls_ops || !ls->ls_ops->recover_slot) return; - /* if there is no comms connection with this node - or the present comms connection is newer - than the one when this member was added, then - we consider the node to have failed (versus - being removed due to dlm_release_lockspace) */ + /* Recover the slot regardless of whether we have a valid connection. + * The node may have simply withdrawn, but still needs its journal + * replayed. */ error = dlm_comm_seq(memb->nodeid, &seq); - if (!error && seq == memb->comm_seq) - return; - slot.nodeid = memb->nodeid; slot.slot = memb->slot; -- 2.20.1