Permission problem perhaps? Not really sure what you're doing but the fact that you have users configuring the cluster (why do you do this btw?) may be pointing to a permission issue.
-mgb On 11-08-03 06:57 PM, Rahul Kanna wrote: > Hi, > > Our system setup: > > Heartbeat 3.0.3 > DRBD (to manage file system and it is one of the resource managed by CRM) > Redhat Linux > Pacemaker > > We have built an application on top of Linux-HA for users to configure > cluster by giving IP addresses of the nodes, do operations like Restart > system, Change host names, Resolve split-brain scenario etc. > In our application, we ran into problem when we do "heartbeat restart" for > some operation and then when user does "Restart System" which internally > runs the command "shutdown -r now". I believe this due to heartbeat lsb > script and I have explained the scenario below. > > Problem: > > In the heartbeat lsb script, restart does not remove and touches the > heartbeat lock file. > > On, "heartbeat start", the lsb script starts heartbeat and touches > /var/lock/subsys/heartbeat lock file. > > On, "heartbeat stop", the lsb script stops heartbeat and removes the lock > file at /var/lock/subsys/heartbeat. > > On, "heartbeat restart", the lsb script stops heartbeat and starts > heartbeat. But DOES NOT remove or touches the lock file. > > We call "heartbeat restart" instead of "heartbeat start" through our script > because we are not sure whether heartbeat is already running or not. So when > "heartbeat restart" is called when heartbeat is NOT running, heartbeat lsb > script tries to stop but its not running so it just starts heartbeat BUT > after starting, heartbeat lock file is not touched (because of restart in > heartbeat lsb). So now, in the system heartbeat is running (can verify this > by looking for heartbeat process or "heartbeat status" command) but there is > no /var/lock/subsys/heartbeat lock file. This lock file is used by the Linux > kernal to know what all process it has to stop when it shuts down (shutdown > -r now). When we run "shutdown -r now", Linux kernal thinks heartbeat is not > running (because there is no lock file) and does not stop heartbeat > properly. When it comes back up, heartbeat is started but heartbeat state is > not correct (because it was not stopped properly). > Due to this, this node is identifies as Primary though the erstwhile > Secondary node has become Primary now and this causes split-brain. > > So I believe, "heartbeat restart" should do exactly as "heartbeat stop and > heartbeat start" which is not the case now. > Can you please let me know if my understanding is correct and it is a bug in > Heartbeat lsb script? Thanks for looking into it. > > I have given below the relevant code from heartbeat lsb script as well" > > File: /etc/init.d/heartbeat > > start) > RunStartStop pre-start > StartHA > RC=$? > echo > if > [ $RC -eq 0 ] > then > [ ! -d $LOCKDIR ]&& mkdir -p $LOCKDIR > touch $LOCKDIR/$SUBSYS > fi > RunStartStop post-start $RC > ;; > > stop) > RunStartStop "pre-stop" > StopHA > RC=$? > echo > if > [ $RC -eq 0 ] > then > rm -f $LOCKDIR/$SUBSYS > fi > RunStartStop post-stop $RC > ;; > > restart) > sleeptime=`ha_parameter deadtime` > StopHA > echo > echo -n "Waiting to allow resource takeover to complete:" > sleep $sleeptime > sleep 10 # allow resource takeover to complete (hopefully). > echo_success > echo > StartHA > echo > ;; > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
