Re: [Linux-HA] Heartbeat Restart is not same as Stop and Start

2011-08-04 Thread Rahul Kanna
Mike,

I checked the permission and those are fine.

If you can please check the restart script I have given below, it does not
touch the heartbeat lock file

*touch $LOCKDIR/$SUBSYS*

when the heartbeat is restared and I guess it is a problem. Is it not?

Btw, we have a product for some web application and as part of it we allow
Administrators to configure servers as redundant server and under lying we
use linux-ha to set up redundant servers.

Rahul
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Heartbeat Restart is not same as Stop and Start

2011-08-03 Thread Rahul Kanna
Hi,

Our system setup:

Heartbeat 3.0.3
DRBD (to manage file system and it is one of the resource managed by CRM)
Redhat Linux
Pacemaker

We have built an application on top of Linux-HA for users to configure
cluster by giving IP addresses of the nodes, do operations like Restart
system, Change host names, Resolve split-brain scenario etc.
In our application, we ran into problem when we do heartbeat restart for
some operation and then when user does Restart System which internally
runs the command shutdown -r now. I believe this due to heartbeat lsb
script and I have explained the scenario below.

Problem:

In the heartbeat lsb script, restart does not remove and touches the
heartbeat lock file.

On, heartbeat start, the lsb script starts heartbeat and touches
/var/lock/subsys/heartbeat lock file.

On, heartbeat stop, the lsb script stops heartbeat and removes the lock
file at /var/lock/subsys/heartbeat.

On, heartbeat restart, the lsb script stops heartbeat and starts
heartbeat. But DOES NOT remove or touches the lock file.

We call heartbeat restart instead of heartbeat start through our script
because we are not sure whether heartbeat is already running or not. So when
heartbeat restart is called when heartbeat is NOT running, heartbeat lsb
script tries to stop but its not running so it just starts heartbeat BUT
after starting, heartbeat lock file is not touched (because of restart in
heartbeat lsb). So now, in the system heartbeat is running (can verify this
by looking for heartbeat process or heartbeat status command) but there is
no /var/lock/subsys/heartbeat lock file. This lock file is used by the Linux
kernal to know what all process it has to stop when it shuts down (shutdown
-r now). When we run shutdown -r now, Linux kernal thinks heartbeat is not
running (because there is no lock file) and does not stop heartbeat
properly. When it comes back up, heartbeat is started but heartbeat state is
not correct (because it was not stopped properly).
Due to this, this node is identifies as Primary though the erstwhile
Secondary node has become Primary now and this causes split-brain.

So I believe, heartbeat restart should do exactly as heartbeat stop and
heartbeat start which is not the case now.
Can you please let me know if my understanding is correct and it is a bug in
Heartbeat lsb script? Thanks for looking into it.

I have given below the relevant code from heartbeat lsb script as well

File: /etc/init.d/heartbeat

  start)
RunStartStop pre-start
StartHA
RC=$?
echo
if
  [ $RC -eq 0 ]
then
  [ ! -d $LOCKDIR ]  mkdir -p $LOCKDIR
  touch $LOCKDIR/$SUBSYS
fi
RunStartStop post-start $RC
;;

  stop)
RunStartStop pre-stop
StopHA
RC=$?
echo
if
  [ $RC -eq 0 ]
then
  rm -f $LOCKDIR/$SUBSYS
fi
RunStartStop post-stop $RC
;;

  restart)
sleeptime=`ha_parameter deadtime`
StopHA
echo
echo -n Waiting to allow resource takeover to complete:
sleep $sleeptime
sleep 10 # allow resource takeover to complete (hopefully).
echo_success
echo
StartHA
echo
;;
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems