Hello Colin maybe your service doesn't switch because this happen ====================================================== Aug 31 17:19:49 rgmanager #13: Service service:nfsdprj failed to stop cleanly Aug 31 17:19:49 rgmanager #13: Service service:httpd failed to stop cleanly ======================================================
for debug your service stop, you can use rg_test test /etc/cluster/cluster.conf stop service <NAME_OF_SERVICE> for help you think is more easy if you show your cluster.conf Thanks :-) 2012/9/1 Colin Simpson <colin.simp...@iongeo.com> > Hi > > I had a strange issue this afternoon. One of my cluster nodes died > (possible hw fault or driver issue). But the other node failed to take a > number of it's services (2 node cluster), when it was successfully fenced. > > The clustat indicated that the services were on still on the original node > (started) but the top lines correctly stated that the node was "offline". > The rgmanager log says for this event: > > Aug 31 17:19:30 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:30 rgmanager [ip] Local ping to 10.10.1.45 succeeded > Aug 31 17:19:37 rgmanager State change: bld1uxn1i DOWN > Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.46, Level 10 > Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.45, Level 0 > Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.33, Level 0 > Aug 31 17:19:49 rgmanager [ip] 10.10.1.46 present on bond0 > Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.43, Level 0 > Aug 31 17:19:49 rgmanager [ip] 10.10.1.45 present on bond0 > Aug 31 17:19:49 rgmanager [ip] 10.10.1.33 present on bond0 > Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected > Aug 31 17:19:49 rgmanager [ip] 10.10.1.43 present on bond0 > Aug 31 17:19:49 rgmanager Taking over service service:nfsdprj from down > member bld1uxn1i > Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected > Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected > Aug 31 17:19:49 rgmanager #47: Failed changing service status > Aug 31 17:19:49 rgmanager Taking over service service:httpd from down > member bld1uxn1i > Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected > Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:49 rgmanager #47: Failed changing service status > Aug 31 17:19:49 rgmanager [ip] Local ping to 10.10.1.46 succeeded > Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:49 rgmanager #13: Service service:nfsdprj failed to stop > cleanly > Aug 31 17:19:49 rgmanager #13: Service service:httpd failed to stop cleanly > A couple of other services did successfully switch after this. > > I have seem this a few times (randomly) on various clusters since around > the time of upgrading to 6.3 from 6.2 (services refusing to cleanly stop on > a node). It's hard to reproduce and when down we usually just want a > restart as fast as possible (thereby limiting time for debugging). > > How can I see what is causing the "#47: Failed changing service status" or > any more debugging we can turn on in rgmanager to help with this? > > Or better still has anyone else seen anything like this? > > Thanks > > Colin > > ________________________________ > > > This email and any files transmitted with it are confidential and are > intended solely for the use of the individual or entity to whom they are > addressed. If you are not the original recipient or the person responsible > for delivering the email to the intended recipient, be advised that you > have received this email in error, and that any use, dissemination, > forwarding, printing, or copying of this email is strictly prohibited. If > you received this email in error, please immediately notify the sender and > delete the original. > > > -- > Linux-cluster mailing list > Linux-cluster@redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster