Re: [Linux-HA] About OCFS2 and Pacemaker

Andrew Beekhof Mon, 20 Sep 2010 23:24:28 -0700

On Mon, Sep 20, 2010 at 5:55 PM, Alain.Moulle <[email protected]> wrote:
> Hi
>
> I have a "philosophic" question about two nodes with FS under OCFS2
> and Pacemaker/corosync for the HA of both nodes.
>
> My choice was to let OCFS2 stack out of Pacemaker configuration,
> so I let the services o2cb and ocfs2 started at boot time.


Terrible idea sorry.  You _really_ dont want two parts of the same
cluster using different membership information.

>
> And I only configure some FS of type OCFS2 as clone resources in Pacemaker,
> and some other resources have collocation on these clones FS OCFS2.
>
> It seemed to me that it should work , and that I don't have to set the
> management
> of OCFS2 "in" Pacemaker , with the pcmk stack instead of the o2cb stack.
>
> And it works fine ... except if I kill one node (by fence, or reboot -f)
> , then I have
> dlm errors on remaining nodes, and some clone FS OCFS2 become failed
> and of course collocated resources are stopped.
> Errors are likewise :
> 1284997247 2010 Sep 20 17:40:47 node0 kern err kernel
> (2191,39):dlm_drop_lockres_ref:2210 ERROR: status = -112
> 1284997247 2010 Sep 20 17:40:47 node0 kern err kernel
> (2191,39):dlm_purge_lockres:205 ERROR: status = -112
> 1284997247 2010 Sep 20 17:40:47 node0 kern err kernel
> (2191,39):dlm_drop_lockres_ref:2210 ERROR: status = -107
> 1284997247 2010 Sep 20 17:40:47 node0 kern err kernel
> (2191,39):dlm_purge_lockres:205 ERROR: status = -107
> 1284997247 2010 Sep 20 17:40:47 node0 kern info kernel ocfs2: Unmounting
> device (8,112) on (node 1)
> 1284997247 2010 Sep 20 17:40:47 node0 kern err kernel
> (2508,3):dlm_do_master_request:1333 ERROR: link to 2 went down!
> 1284997247 2010 Sep 20 17:40:47 node0 kern err kernel
> (2508,3):dlm_get_lock_resource:916 ERROR: status = -107
> 1284997247 2010 Sep 20 17:40:47 node0 syslog err syslog-ng Initiating
> connection failed, reconnecting; time_reopen='10'
> 1284997267 2010 Sep 20 17:41:07 node0 syslog err syslog-ng Error
> resolving hostname; host='syslog-server'
> 1284997267 2010 Sep 20 17:41:07 node0 kern err kernel
> (22095,6):dlm_send_proxy_ast_msg:456 ERROR: status = -107
> 1284997267 2010 Sep 20 17:41:07 node0 kern err kernel
> (22095,6):dlm_flush_asts:603 ERROR: status = -107
> 1284997267 2010 Sep 20 17:41:07 node0 syslog err syslog-ng Initiating
> connection failed, reconnecting; time_reopen='10'
> etc.
>
> And in fact, I have this type of errors even /without/ Pacemaker started
> on any node when I also kill one node.
>
> So dlm/ocfs2 errors in syslog seem "normal" , but my clone-fs in
> Pacemaker do not "take them as normal" as some become "Failed" for
> "unknown error" .
>
> So my question is :
>   is my configuration expected to work ?
>   (and if so, how could I workaround this problem ?)
> or
>   is pcmk stack really mandatory when we have ocfs2 and Pacemaker
> together on two nodes ?
>
> Thanks for your responses.
> Alain
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] About OCFS2 and Pacemaker

Reply via email to