More clues, using system-config-cluster
When I try to run a service in state failed I always get an error. I have tu disable the service, to get disabled state. With this state I can restart the services. I think I have a problem with the relocate because I cant do it nor with luci nor with system-config-cluster nor with clusvadm I always get error when i try this greetings ESG 2009/2/13 ESGLinux <[email protected]> > Hello, > > The services run ok on node1. If I halt node2 and try to run the services > the run ok on node1. > If I run the services without cluster they also run ok. > > I have eliminated the HTTP services and I have left the service BBDD to > debug the problem. Here is the log when the service is running on node2 and > node1 comes up: > > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering GATHER state from 11. > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Creating commit token because > I > am > the rep. > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Saving state aru 1a high seq > receiv > ed 1a > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Storing new sequence id for > ring > 17 > f4 > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering COMMIT state. > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering RECOVERY state. > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] position [0] member > 192.168.1.185: > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] previous ring seq 6128 rep > 192.168. > 1.185 > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] aru 1a high delivered 1a > received > f > lag 1 > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] position [1] member > 192.168.1.188: > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] previous ring seq 6128 rep > 192.168. > 1.188 > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] aru 9 high delivered 9 > received > fla > g 1 > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Did not need to originate any > messa > ges in recovery. > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Sending initial ORF token > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] CLM CONFIGURATION CHANGE > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] New Configuration: > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.185) > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Left: > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Joined: > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] CLM CONFIGURATION CHANGE > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] New Configuration: > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.185) > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.188) > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Left: > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Joined: > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.188) > Feb 13 09:16:00 NODE2 openais[3326]: [SYNC ] This node is within the > primary component and will provide service. > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering OPERATIONAL state. > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] got nodejoin message > 192.168.1.185 > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] got nodejoin message > 192.168.1.188 > Feb 13 09:16:00 NODE2 openais[3326]: [CPG ] got joinlist message from node > 2 > Feb 13 09:16:03 NODE2 kernel: dlm: connecting to 1 > Feb 13 09:16:24 NODE2 clurgmgrd[4001]: <notice> Relocating service:BBDD to > better node node1 > Feb 13 09:16:24 NODE2 clurgmgrd[4001]: <notice> Stopping service > service:BBDD > Feb 13 09:16:25 NODE2 clurgmgrd: [4001]: <err> Stopping Service mysql:mydb > > Failed - Application Is Still Running > Feb 13 09:16:25 NODE2 clurgmgrd: [4001]: <err> Stopping Service mysql:mydb > > Failed > Feb 13 09:16:25 NODE2 clurgmgrd[4001]: <notice> stop on mysql "mydb" > returned 1 (generic error) > Feb 13 09:16:25 NODE2 avahi-daemon[3872]: Withdrawing address record for > 192.168.1.183 on eth0. > Feb 13 09:16:35 NODE2 clurgmgrd[4001]: <crit> #12: RG service:BBDD failed > to stop; intervention required > Feb 13 09:16:35 NODE2 clurgmgrd[4001]: <notice> Service service:BBDD is > failed > Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <warning> #70: Failed to relocate > service:BBDD; restarting locally > Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <err> #43: Service service:BBDD has > failed; can not start. > Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <alert> #2: Service service:BBDD > returned failure code. Last Owner: node2 > Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <alert> #4: Administrator > intervention required. > > > As you can see in the message "Relocating service:BBDD to better node > node1" > > But it fails > > Another error that appears frecuently in my logs is the next: > > <err> Checking Existence Of File /var/run/cluster/mysql/mysql:mydb.pid > [mysql:mydb] > Failed - File Doesn't Exist > > I dont know if this is important. but I think this makes the message err> > Stopping Service mysql:mydb > Failed - Application Is Still Running and this > makes the service fails (I´m just guessing...) > > Any idea? > > > ESG > > > 2009/2/12 rajveer singh <[email protected]> > >> Hi, >> >> Ok, perhaps there is some problem with the services on node1 , so, are you >> able to run these services on node1 without cluster. You first stop the >> cluster, and try to run these services on node1. >> >> It should run. >> >> Re, >> Rajveer Singh >> >> 2009/2/13 ESGLinux <[email protected]> >> >> Hello, >>> >>> Thats what I want, when node1 comes up I want to relocate to node1 but >>> what I get is all my services stoped and in failed state. >>> >>> With my configuration I expect to have the services running on node1. >>> >>> Any idea about this behaviour? >>> >>> Thanks >>> >>> ESG >>> >>> >>> 2009/2/12 rajveer singh <[email protected]> >>> >>> >>>> >>>> 2009/2/12 ESGLinux <[email protected]> >>>> >>>>> Hello all, >>>>> >>>>> I´m testing a cluster using luci as admin tool. I have configured 2 >>>>> nodes with 2 services http + mysql. This configuration works almost fine. >>>>> I >>>>> have the services running on the node1 >>>>> and y reboot this node1. Then the services relocates to node2 and all >>>>> contnues working but, when the node1 goes up all the services stops. >>>>> >>>>> I think that the node1, when comes alive, tries to run the services and >>>>> that makes the services stops, can it be true? I think node1 should not >>>>> start anything because the services are running in node2. >>>>> >>>>> Perphaps is a problem with the configuration, perhaps with fencing (i >>>>> have not configured fencing at all) >>>>> >>>>> here is my cluster.conf. Any idea? >>>>> >>>>> Thanks in advace >>>>> >>>>> ESG >>>>> >>>>> >>>>> <?xml version="1.0"?> >>>>> <cluster alias="MICLUSTER" config_version="29" name="MICLUSTER"> >>>>> <fence_daemon clean_start="0" post_fail_delay="0" >>>>> post_join_delay="3"/> >>>>> <clusternodes> >>>>> <clusternode name="node1" nodeid="1" votes="1"> >>>>> <fence/> >>>>> </clusternode> >>>>> <clusternode name="node2" nodeid="2" votes="1"> >>>>> <fence/> >>>>> </clusternode> >>>>> </clusternodes> >>>>> <cman expected_votes="1" two_node="1"/> >>>>> <fencedevices/> >>>>> <rm> >>>>> <failoverdomains> >>>>> <failoverdomain name="DOMINIOFAIL" >>>>> nofailback="0" ordere >>>>> d="1" restricted="1"> >>>>> * <failoverdomainnode name="node1" >>>>> priority="1"/> >>>>> * * <failoverdomainnode name="node2" >>>>> priority="2"/> >>>>> * </failoverdomain> >>>>> </failoverdomains> >>>>> <resources> >>>>> <ip address="192.168.1.183" monitor_link="1"/> >>>>> </resources> >>>>> <service autostart="1" domain="DOMINIOFAIL" >>>>> exclusive="0" name=" >>>>> HTTP" recovery="relocate"> >>>>> <apache config_file="conf/httpd.conf" >>>>> name="http" server >>>>> _root="/etc/httpd" shutdown_wait="0"/> >>>>> <ip ref="192.168.1.183"/> >>>>> </service> >>>>> <service autostart="1" domain="DOMINIOFAIL" >>>>> exclusive="0" name=" >>>>> BBDD" recovery="relocate"> >>>>> <mysql config_file="/etc/my.cnf" >>>>> listen_address="192.168 >>>>> .1.183" name="mydb" shutdown_wait="0"/> >>>>> <ip ref="192.168.1.183"/> >>>>> </service> >>>>> </rm> >>>>> </cluster> >>>>> >>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> [email protected] >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>> >>>> Hi ESG, >>>> >>>> Offcoures, as you have defined the priority of node1 as 1 and node2 as >>>> 2, so node1 is having more priority, so whenever it will be up, it will try >>>> to run the service on itself and so it will relocate the service from >>>> node2 >>>> to node1. >>>> >>>> >>>> Re, >>>> Rajveer Singh >>>> >>>> -- >>>> Linux-cluster mailing list >>>> [email protected] >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>> >>> >>> -- >>> Linux-cluster mailing list >>> [email protected] >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> >> -- >> Linux-cluster mailing list >> [email protected] >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > >
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
