Re: [Linux-HA] Heartbeat Restart is not same as Stop and Start
Mike, I checked the permission and those are fine. If you can please check the restart script I have given below, it does not touch the heartbeat lock file *touch $LOCKDIR/$SUBSYS* when the heartbeat is restared and I guess it is a problem. Is it not? Btw, we have a product for some web application and as part of it we allow Administrators to configure servers as redundant server and under lying we use linux-ha to set up redundant servers. Rahul ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Heartbeat Restart is not same as Stop and Start
Hi, Our system setup: Heartbeat 3.0.3 DRBD (to manage file system and it is one of the resource managed by CRM) Redhat Linux Pacemaker We have built an application on top of Linux-HA for users to configure cluster by giving IP addresses of the nodes, do operations like Restart system, Change host names, Resolve split-brain scenario etc. In our application, we ran into problem when we do heartbeat restart for some operation and then when user does Restart System which internally runs the command shutdown -r now. I believe this due to heartbeat lsb script and I have explained the scenario below. Problem: In the heartbeat lsb script, restart does not remove and touches the heartbeat lock file. On, heartbeat start, the lsb script starts heartbeat and touches /var/lock/subsys/heartbeat lock file. On, heartbeat stop, the lsb script stops heartbeat and removes the lock file at /var/lock/subsys/heartbeat. On, heartbeat restart, the lsb script stops heartbeat and starts heartbeat. But DOES NOT remove or touches the lock file. We call heartbeat restart instead of heartbeat start through our script because we are not sure whether heartbeat is already running or not. So when heartbeat restart is called when heartbeat is NOT running, heartbeat lsb script tries to stop but its not running so it just starts heartbeat BUT after starting, heartbeat lock file is not touched (because of restart in heartbeat lsb). So now, in the system heartbeat is running (can verify this by looking for heartbeat process or heartbeat status command) but there is no /var/lock/subsys/heartbeat lock file. This lock file is used by the Linux kernal to know what all process it has to stop when it shuts down (shutdown -r now). When we run shutdown -r now, Linux kernal thinks heartbeat is not running (because there is no lock file) and does not stop heartbeat properly. When it comes back up, heartbeat is started but heartbeat state is not correct (because it was not stopped properly). Due to this, this node is identifies as Primary though the erstwhile Secondary node has become Primary now and this causes split-brain. So I believe, heartbeat restart should do exactly as heartbeat stop and heartbeat start which is not the case now. Can you please let me know if my understanding is correct and it is a bug in Heartbeat lsb script? Thanks for looking into it. I have given below the relevant code from heartbeat lsb script as well File: /etc/init.d/heartbeat start) RunStartStop pre-start StartHA RC=$? echo if [ $RC -eq 0 ] then [ ! -d $LOCKDIR ] mkdir -p $LOCKDIR touch $LOCKDIR/$SUBSYS fi RunStartStop post-start $RC ;; stop) RunStartStop pre-stop StopHA RC=$? echo if [ $RC -eq 0 ] then rm -f $LOCKDIR/$SUBSYS fi RunStartStop post-stop $RC ;; restart) sleeptime=`ha_parameter deadtime` StopHA echo echo -n Waiting to allow resource takeover to complete: sleep $sleeptime sleep 10 # allow resource takeover to complete (hopefully). echo_success echo StartHA echo ;; ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Restart is not same as Stop and Start
Permission problem perhaps? Not really sure what you're doing but the fact that you have users configuring the cluster (why do you do this btw?) may be pointing to a permission issue. -mgb On 11-08-03 06:57 PM, Rahul Kanna wrote: Hi, Our system setup: Heartbeat 3.0.3 DRBD (to manage file system and it is one of the resource managed by CRM) Redhat Linux Pacemaker We have built an application on top of Linux-HA for users to configure cluster by giving IP addresses of the nodes, do operations like Restart system, Change host names, Resolve split-brain scenario etc. In our application, we ran into problem when we do heartbeat restart for some operation and then when user does Restart System which internally runs the command shutdown -r now. I believe this due to heartbeat lsb script and I have explained the scenario below. Problem: In the heartbeat lsb script, restart does not remove and touches the heartbeat lock file. On, heartbeat start, the lsb script starts heartbeat and touches /var/lock/subsys/heartbeat lock file. On, heartbeat stop, the lsb script stops heartbeat and removes the lock file at /var/lock/subsys/heartbeat. On, heartbeat restart, the lsb script stops heartbeat and starts heartbeat. But DOES NOT remove or touches the lock file. We call heartbeat restart instead of heartbeat start through our script because we are not sure whether heartbeat is already running or not. So when heartbeat restart is called when heartbeat is NOT running, heartbeat lsb script tries to stop but its not running so it just starts heartbeat BUT after starting, heartbeat lock file is not touched (because of restart in heartbeat lsb). So now, in the system heartbeat is running (can verify this by looking for heartbeat process or heartbeat status command) but there is no /var/lock/subsys/heartbeat lock file. This lock file is used by the Linux kernal to know what all process it has to stop when it shuts down (shutdown -r now). When we run shutdown -r now, Linux kernal thinks heartbeat is not running (because there is no lock file) and does not stop heartbeat properly. When it comes back up, heartbeat is started but heartbeat state is not correct (because it was not stopped properly). Due to this, this node is identifies as Primary though the erstwhile Secondary node has become Primary now and this causes split-brain. So I believe, heartbeat restart should do exactly as heartbeat stop and heartbeat start which is not the case now. Can you please let me know if my understanding is correct and it is a bug in Heartbeat lsb script? Thanks for looking into it. I have given below the relevant code from heartbeat lsb script as well File: /etc/init.d/heartbeat start) RunStartStop pre-start StartHA RC=$? echo if [ $RC -eq 0 ] then [ ! -d $LOCKDIR ] mkdir -p $LOCKDIR touch $LOCKDIR/$SUBSYS fi RunStartStop post-start $RC ;; stop) RunStartStop pre-stop StopHA RC=$? echo if [ $RC -eq 0 ] then rm -f $LOCKDIR/$SUBSYS fi RunStartStop post-stop $RC ;; restart) sleeptime=`ha_parameter deadtime` StopHA echo echo -n Waiting to allow resource takeover to complete: sleep $sleeptime sleep 10 # allow resource takeover to complete (hopefully). echo_success echo StartHA echo ;; ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems