Hi I have a "philosophic" question about two nodes with FS under OCFS2 and Pacemaker/corosync for the HA of both nodes.
My choice was to let OCFS2 stack out of Pacemaker configuration, so I let the services o2cb and ocfs2 started at boot time. And I only configure some FS of type OCFS2 as clone resources in Pacemaker, and some other resources have collocation on these clones FS OCFS2. It seemed to me that it should work , and that I don't have to set the management of OCFS2 "in" Pacemaker , with the pcmk stack instead of the o2cb stack. And it works fine ... except if I kill one node (by fence, or reboot -f) , then I have dlm errors on remaining nodes, and some clone FS OCFS2 become failed and of course collocated resources are stopped. Errors are likewise : 1284997247 2010 Sep 20 17:40:47 node0 kern err kernel (2191,39):dlm_drop_lockres_ref:2210 ERROR: status = -112 1284997247 2010 Sep 20 17:40:47 node0 kern err kernel (2191,39):dlm_purge_lockres:205 ERROR: status = -112 1284997247 2010 Sep 20 17:40:47 node0 kern err kernel (2191,39):dlm_drop_lockres_ref:2210 ERROR: status = -107 1284997247 2010 Sep 20 17:40:47 node0 kern err kernel (2191,39):dlm_purge_lockres:205 ERROR: status = -107 1284997247 2010 Sep 20 17:40:47 node0 kern info kernel ocfs2: Unmounting device (8,112) on (node 1) 1284997247 2010 Sep 20 17:40:47 node0 kern err kernel (2508,3):dlm_do_master_request:1333 ERROR: link to 2 went down! 1284997247 2010 Sep 20 17:40:47 node0 kern err kernel (2508,3):dlm_get_lock_resource:916 ERROR: status = -107 1284997247 2010 Sep 20 17:40:47 node0 syslog err syslog-ng Initiating connection failed, reconnecting; time_reopen='10' 1284997267 2010 Sep 20 17:41:07 node0 syslog err syslog-ng Error resolving hostname; host='syslog-server' 1284997267 2010 Sep 20 17:41:07 node0 kern err kernel (22095,6):dlm_send_proxy_ast_msg:456 ERROR: status = -107 1284997267 2010 Sep 20 17:41:07 node0 kern err kernel (22095,6):dlm_flush_asts:603 ERROR: status = -107 1284997267 2010 Sep 20 17:41:07 node0 syslog err syslog-ng Initiating connection failed, reconnecting; time_reopen='10' etc. And in fact, I have this type of errors even /without/ Pacemaker started on any node when I also kill one node. So dlm/ocfs2 errors in syslog seem "normal" , but my clone-fs in Pacemaker do not "take them as normal" as some become "Failed" for "unknown error" . So my question is : is my configuration expected to work ? (and if so, how could I workaround this problem ?) or is pcmk stack really mandatory when we have ocfs2 and Pacemaker together on two nodes ? Thanks for your responses. Alain _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
