hello all following with the problem, anyone can explain this:
The commands are run all in aprox 1 minute: disable the service [r...@node2 log]# clusvcadm -d BBDD Local machine disabling service:BBDD...Yes enable the service [r...@node2 log]# clusvcadm -e BBDD Local machine trying to enable service:BBDD...Success service:BBDD is now running on node2 its ok, the service is running in node2, try to relocate to node1 r...@node2 log]# clusvcadm -r BBDD -m node1 Trying to relocate service:BBDD to node1...Success it works!!! fine, try to relocate again to node2 service:BBDD is now running on node1 [r...@node2 log]# clusvcadm -r BBDD -m node2 Trying to relocate service:BBDD to node2...Success it works again !!! I cant believe it. Try to relocate to node1 again service:BBDD is now running on node2 [r...@node2 log]# clusvcadm -r BBDD -m node1 Trying to relocate service:BBDD to node1...Failure Opps!! it fails!!! Why? why 30 secs before it works and now it fails? In this situation all I can do is enable an disable the service again to get it works. It never gets up automatically... [r...@node2 log]# clusvcadm -d BBDD Local machine disabling service:BBDD...Yes [r...@node2 log]# clusvcadm -e BBDD Local machine trying to enable service:BBDD...Success service:BBDD is now running on node2 Any explanation for this behaviour??? I´m complety astonished :-( TIA ESG 2009/2/13 ESGLinux <[email protected]> > More clues, > > using system-config-cluster > > When I try to run a service in state failed I always get an error. > I have tu disable the service, to get disabled state. With this state I can > restart the services. > > I think I have a problem with the relocate because I cant do it nor with > luci nor with system-config-cluster nor with clusvadm > > I always get error when i try this > > greetings > > ESG > > > 2009/2/13 ESGLinux <[email protected]> > >> Hello, >> >> The services run ok on node1. If I halt node2 and try to run the services >> the run ok on node1. >> If I run the services without cluster they also run ok. >> >> I have eliminated the HTTP services and I have left the service BBDD to >> debug the problem. Here is the log when the service is running on node2 and >> node1 comes up: >> >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering GATHER state from >> 11. >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Creating commit token because >> I >> am >> the rep. >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Saving state aru 1a high seq >> receiv >> ed 1a >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Storing new sequence id for >> ring >> 17 >> f4 >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering COMMIT state. >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering RECOVERY state. >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] position [0] member >> 192.168.1.185: >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] previous ring seq 6128 rep >> 192.168. >> 1.185 >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] aru 1a high delivered 1a >> received >> f >> lag 1 >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] position [1] member >> 192.168.1.188: >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] previous ring seq 6128 rep >> 192.168. >> 1.188 >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] aru 9 high delivered 9 >> received >> fla >> g 1 >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Did not need to originate any >> messa >> ges in recovery. >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Sending initial ORF token >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] CLM CONFIGURATION CHANGE >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] New Configuration: >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.185) >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Left: >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Joined: >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] CLM CONFIGURATION CHANGE >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] New Configuration: >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.185) >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.188) >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Left: >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Joined: >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.188) >> Feb 13 09:16:00 NODE2 openais[3326]: [SYNC ] This node is within the >> primary component and will provide service. >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering OPERATIONAL state. >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] got nodejoin message >> 192.168.1.185 >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] got nodejoin message >> 192.168.1.188 >> Feb 13 09:16:00 NODE2 openais[3326]: [CPG ] got joinlist message from >> node 2 >> Feb 13 09:16:03 NODE2 kernel: dlm: connecting to 1 >> Feb 13 09:16:24 NODE2 clurgmgrd[4001]: <notice> Relocating service:BBDD to >> better node node1 >> Feb 13 09:16:24 NODE2 clurgmgrd[4001]: <notice> Stopping service >> service:BBDD >> Feb 13 09:16:25 NODE2 clurgmgrd: [4001]: <err> Stopping Service mysql:mydb >> > Failed - Application Is Still Running >> Feb 13 09:16:25 NODE2 clurgmgrd: [4001]: <err> Stopping Service mysql:mydb >> > Failed >> Feb 13 09:16:25 NODE2 clurgmgrd[4001]: <notice> stop on mysql "mydb" >> returned 1 (generic error) >> Feb 13 09:16:25 NODE2 avahi-daemon[3872]: Withdrawing address record for >> 192.168.1.183 on eth0. >> Feb 13 09:16:35 NODE2 clurgmgrd[4001]: <crit> #12: RG service:BBDD failed >> to stop; intervention required >> Feb 13 09:16:35 NODE2 clurgmgrd[4001]: <notice> Service service:BBDD is >> failed >> Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <warning> #70: Failed to relocate >> service:BBDD; restarting locally >> Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <err> #43: Service service:BBDD has >> failed; can not start. >> Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <alert> #2: Service service:BBDD >> returned failure code. Last Owner: node2 >> Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <alert> #4: Administrator >> intervention required. >> >> >> As you can see in the message "Relocating service:BBDD to better node >> node1" >> >> But it fails >> >> Another error that appears frecuently in my logs is the next: >> >> <err> Checking Existence Of File /var/run/cluster/mysql/mysql:mydb.pid >> [mysql:mydb] > Failed - File Doesn't Exist >> >> I dont know if this is important. but I think this makes the message err> >> Stopping Service mysql:mydb > Failed - Application Is Still Running and this >> makes the service fails (I´m just guessing...) >> >> Any idea? >> >> >> ESG >> >> >> 2009/2/12 rajveer singh <[email protected]> >> >>> Hi, >>> >>> Ok, perhaps there is some problem with the services on node1 , so, are >>> you able to run these services on node1 without cluster. You first stop the >>> cluster, and try to run these services on node1. >>> >>> It should run. >>> >>> Re, >>> Rajveer Singh >>> >>> 2009/2/13 ESGLinux <[email protected]> >>> >>> Hello, >>>> >>>> Thats what I want, when node1 comes up I want to relocate to node1 but >>>> what I get is all my services stoped and in failed state. >>>> >>>> With my configuration I expect to have the services running on node1. >>>> >>>> Any idea about this behaviour? >>>> >>>> Thanks >>>> >>>> ESG >>>> >>>> >>>> 2009/2/12 rajveer singh <[email protected]> >>>> >>>> >>>>> >>>>> 2009/2/12 ESGLinux <[email protected]> >>>>> >>>>>> Hello all, >>>>>> >>>>>> I´m testing a cluster using luci as admin tool. I have configured 2 >>>>>> nodes with 2 services http + mysql. This configuration works almost >>>>>> fine. I >>>>>> have the services running on the node1 >>>>>> and y reboot this node1. Then the services relocates to node2 and all >>>>>> contnues working but, when the node1 goes up all the services stops. >>>>>> >>>>>> I think that the node1, when comes alive, tries to run the services >>>>>> and that makes the services stops, can it be true? I think node1 should >>>>>> not >>>>>> start anything because the services are running in node2. >>>>>> >>>>>> Perphaps is a problem with the configuration, perhaps with fencing (i >>>>>> have not configured fencing at all) >>>>>> >>>>>> here is my cluster.conf. Any idea? >>>>>> >>>>>> Thanks in advace >>>>>> >>>>>> ESG >>>>>> >>>>>> >>>>>> <?xml version="1.0"?> >>>>>> <cluster alias="MICLUSTER" config_version="29" name="MICLUSTER"> >>>>>> <fence_daemon clean_start="0" post_fail_delay="0" >>>>>> post_join_delay="3"/> >>>>>> <clusternodes> >>>>>> <clusternode name="node1" nodeid="1" votes="1"> >>>>>> <fence/> >>>>>> </clusternode> >>>>>> <clusternode name="node2" nodeid="2" votes="1"> >>>>>> <fence/> >>>>>> </clusternode> >>>>>> </clusternodes> >>>>>> <cman expected_votes="1" two_node="1"/> >>>>>> <fencedevices/> >>>>>> <rm> >>>>>> <failoverdomains> >>>>>> <failoverdomain name="DOMINIOFAIL" >>>>>> nofailback="0" ordere >>>>>> d="1" restricted="1"> >>>>>> * <failoverdomainnode name="node1" >>>>>> priority="1"/> >>>>>> * * <failoverdomainnode name="node2" >>>>>> priority="2"/> >>>>>> * </failoverdomain> >>>>>> </failoverdomains> >>>>>> <resources> >>>>>> <ip address="192.168.1.183" monitor_link="1"/> >>>>>> </resources> >>>>>> <service autostart="1" domain="DOMINIOFAIL" >>>>>> exclusive="0" name=" >>>>>> HTTP" recovery="relocate"> >>>>>> <apache config_file="conf/httpd.conf" >>>>>> name="http" server >>>>>> _root="/etc/httpd" shutdown_wait="0"/> >>>>>> <ip ref="192.168.1.183"/> >>>>>> </service> >>>>>> <service autostart="1" domain="DOMINIOFAIL" >>>>>> exclusive="0" name=" >>>>>> BBDD" recovery="relocate"> >>>>>> <mysql config_file="/etc/my.cnf" >>>>>> listen_address="192.168 >>>>>> .1.183" name="mydb" shutdown_wait="0"/> >>>>>> <ip ref="192.168.1.183"/> >>>>>> </service> >>>>>> </rm> >>>>>> </cluster> >>>>>> >>>>>> >>>>>> -- >>>>>> Linux-cluster mailing list >>>>>> [email protected] >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>>> >>>>> >>>>> Hi ESG, >>>>> >>>>> Offcoures, as you have defined the priority of node1 as 1 and node2 as >>>>> 2, so node1 is having more priority, so whenever it will be up, it will >>>>> try >>>>> to run the service on itself and so it will relocate the service from >>>>> node2 >>>>> to node1. >>>>> >>>>> >>>>> Re, >>>>> Rajveer Singh >>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> [email protected] >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>> >>>> >>>> -- >>>> Linux-cluster mailing list >>>> [email protected] >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>> >>> >>> -- >>> Linux-cluster mailing list >>> [email protected] >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> >
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
