Hello, The services run ok on node1. If I halt node2 and try to run the services the run ok on node1. If I run the services without cluster they also run ok.
I have eliminated the HTTP services and I have left the service BBDD to debug the problem. Here is the log when the service is running on node2 and node1 comes up: Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering GATHER state from 11. Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Creating commit token because I am the rep. Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Saving state aru 1a high seq receiv ed 1a Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Storing new sequence id for ring 17 f4 Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering COMMIT state. Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering RECOVERY state. Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] position [0] member 192.168.1.185: Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] previous ring seq 6128 rep 192.168. 1.185 Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] aru 1a high delivered 1a received f lag 1 Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] position [1] member 192.168.1.188: Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] previous ring seq 6128 rep 192.168. 1.188 Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] aru 9 high delivered 9 received fla g 1 Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Did not need to originate any messa ges in recovery. Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Sending initial ORF token Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] CLM CONFIGURATION CHANGE Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] New Configuration: Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.185) Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Left: Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Joined: Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] CLM CONFIGURATION CHANGE Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] New Configuration: Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.185) Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.188) Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Left: Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Joined: Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.188) Feb 13 09:16:00 NODE2 openais[3326]: [SYNC ] This node is within the primary component and will provide service. Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering OPERATIONAL state. Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] got nodejoin message 192.168.1.185 Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] got nodejoin message 192.168.1.188 Feb 13 09:16:00 NODE2 openais[3326]: [CPG ] got joinlist message from node 2 Feb 13 09:16:03 NODE2 kernel: dlm: connecting to 1 Feb 13 09:16:24 NODE2 clurgmgrd[4001]: <notice> Relocating service:BBDD to better node node1 Feb 13 09:16:24 NODE2 clurgmgrd[4001]: <notice> Stopping service service:BBDD Feb 13 09:16:25 NODE2 clurgmgrd: [4001]: <err> Stopping Service mysql:mydb > Failed - Application Is Still Running Feb 13 09:16:25 NODE2 clurgmgrd: [4001]: <err> Stopping Service mysql:mydb > Failed Feb 13 09:16:25 NODE2 clurgmgrd[4001]: <notice> stop on mysql "mydb" returned 1 (generic error) Feb 13 09:16:25 NODE2 avahi-daemon[3872]: Withdrawing address record for 192.168.1.183 on eth0. Feb 13 09:16:35 NODE2 clurgmgrd[4001]: <crit> #12: RG service:BBDD failed to stop; intervention required Feb 13 09:16:35 NODE2 clurgmgrd[4001]: <notice> Service service:BBDD is failed Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <warning> #70: Failed to relocate service:BBDD; restarting locally Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <err> #43: Service service:BBDD has failed; can not start. Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <alert> #2: Service service:BBDD returned failure code. Last Owner: node2 Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <alert> #4: Administrator intervention required. As you can see in the message "Relocating service:BBDD to better node node1" But it fails Another error that appears frecuently in my logs is the next: <err> Checking Existence Of File /var/run/cluster/mysql/mysql:mydb.pid [mysql:mydb] > Failed - File Doesn't Exist I dont know if this is important. but I think this makes the message err> Stopping Service mysql:mydb > Failed - Application Is Still Running and this makes the service fails (I´m just guessing...) Any idea? ESG 2009/2/12 rajveer singh <[email protected]> > Hi, > > Ok, perhaps there is some problem with the services on node1 , so, are you > able to run these services on node1 without cluster. You first stop the > cluster, and try to run these services on node1. > > It should run. > > Re, > Rajveer Singh > > 2009/2/13 ESGLinux <[email protected]> > > Hello, >> >> Thats what I want, when node1 comes up I want to relocate to node1 but >> what I get is all my services stoped and in failed state. >> >> With my configuration I expect to have the services running on node1. >> >> Any idea about this behaviour? >> >> Thanks >> >> ESG >> >> >> 2009/2/12 rajveer singh <[email protected]> >> >> >>> >>> 2009/2/12 ESGLinux <[email protected]> >>> >>>> Hello all, >>>> >>>> I´m testing a cluster using luci as admin tool. I have configured 2 >>>> nodes with 2 services http + mysql. This configuration works almost fine. I >>>> have the services running on the node1 >>>> and y reboot this node1. Then the services relocates to node2 and all >>>> contnues working but, when the node1 goes up all the services stops. >>>> >>>> I think that the node1, when comes alive, tries to run the services and >>>> that makes the services stops, can it be true? I think node1 should not >>>> start anything because the services are running in node2. >>>> >>>> Perphaps is a problem with the configuration, perhaps with fencing (i >>>> have not configured fencing at all) >>>> >>>> here is my cluster.conf. Any idea? >>>> >>>> Thanks in advace >>>> >>>> ESG >>>> >>>> >>>> <?xml version="1.0"?> >>>> <cluster alias="MICLUSTER" config_version="29" name="MICLUSTER"> >>>> <fence_daemon clean_start="0" post_fail_delay="0" >>>> post_join_delay="3"/> >>>> <clusternodes> >>>> <clusternode name="node1" nodeid="1" votes="1"> >>>> <fence/> >>>> </clusternode> >>>> <clusternode name="node2" nodeid="2" votes="1"> >>>> <fence/> >>>> </clusternode> >>>> </clusternodes> >>>> <cman expected_votes="1" two_node="1"/> >>>> <fencedevices/> >>>> <rm> >>>> <failoverdomains> >>>> <failoverdomain name="DOMINIOFAIL" >>>> nofailback="0" ordere >>>> d="1" restricted="1"> >>>> * <failoverdomainnode name="node1" >>>> priority="1"/> >>>> * * <failoverdomainnode name="node2" >>>> priority="2"/> >>>> * </failoverdomain> >>>> </failoverdomains> >>>> <resources> >>>> <ip address="192.168.1.183" monitor_link="1"/> >>>> </resources> >>>> <service autostart="1" domain="DOMINIOFAIL" >>>> exclusive="0" name=" >>>> HTTP" recovery="relocate"> >>>> <apache config_file="conf/httpd.conf" >>>> name="http" server >>>> _root="/etc/httpd" shutdown_wait="0"/> >>>> <ip ref="192.168.1.183"/> >>>> </service> >>>> <service autostart="1" domain="DOMINIOFAIL" >>>> exclusive="0" name=" >>>> BBDD" recovery="relocate"> >>>> <mysql config_file="/etc/my.cnf" >>>> listen_address="192.168 >>>> .1.183" name="mydb" shutdown_wait="0"/> >>>> <ip ref="192.168.1.183"/> >>>> </service> >>>> </rm> >>>> </cluster> >>>> >>>> >>>> -- >>>> Linux-cluster mailing list >>>> [email protected] >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>> >>> Hi ESG, >>> >>> Offcoures, as you have defined the priority of node1 as 1 and node2 as 2, >>> so node1 is having more priority, so whenever it will be up, it will try to >>> run the service on itself and so it will relocate the service from node2 to >>> node1. >>> >>> >>> Re, >>> Rajveer Singh >>> >>> -- >>> Linux-cluster mailing list >>> [email protected] >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> >> -- >> Linux-cluster mailing list >> [email protected] >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster >
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
