Permission problem perhaps? Not really sure what you're doing but the 
fact that you have users configuring the cluster (why do you do this 
btw?) may be pointing to a permission issue.

-mgb
On 11-08-03 06:57 PM, Rahul Kanna wrote:
> Hi,
>
> Our system setup:
>
> Heartbeat 3.0.3
> DRBD (to manage file system and it is one of the resource managed by CRM)
> Redhat Linux
> Pacemaker
>
> We have built an application on top of Linux-HA for users to configure
> cluster by giving IP addresses of the nodes, do operations like Restart
> system, Change host names, Resolve split-brain scenario etc.
> In our application, we ran into problem when we do "heartbeat restart" for
> some operation and then when user does "Restart System" which internally
> runs the command "shutdown -r now". I believe this due to heartbeat lsb
> script and I have explained the scenario below.
>
> Problem:
>
> In the heartbeat lsb script, restart does not remove and touches the
> heartbeat lock file.
>
> On, "heartbeat start", the lsb script starts heartbeat and touches
> /var/lock/subsys/heartbeat lock file.
>
> On, "heartbeat stop", the lsb script stops heartbeat and removes the lock
> file at /var/lock/subsys/heartbeat.
>
> On, "heartbeat restart", the lsb script stops heartbeat and starts
> heartbeat. But DOES NOT remove or touches the lock file.
>
> We call "heartbeat restart" instead of "heartbeat start" through our script
> because we are not sure whether heartbeat is already running or not. So when
> "heartbeat restart" is called when heartbeat is NOT running, heartbeat lsb
> script tries to stop but its not running so it just starts heartbeat BUT
> after starting, heartbeat lock file is not touched (because of restart in
> heartbeat lsb). So now, in the system heartbeat is running (can verify this
> by looking for heartbeat process or "heartbeat status" command) but there is
> no /var/lock/subsys/heartbeat lock file. This lock file is used by the Linux
> kernal to know what all process it has to stop when it shuts down (shutdown
> -r now). When we run "shutdown -r now", Linux kernal thinks heartbeat is not
> running (because there is no lock file) and does not stop heartbeat
> properly. When it comes back up, heartbeat is started but heartbeat state is
> not correct (because it was not stopped properly).
> Due to this, this node is identifies as Primary though the erstwhile
> Secondary node has become Primary now and this causes split-brain.
>
> So I believe, "heartbeat restart" should do exactly as "heartbeat stop and
> heartbeat start" which is not the case now.
> Can you please let me know if my understanding is correct and it is a bug in
> Heartbeat lsb script? Thanks for looking into it.
>
> I have given below the relevant code from heartbeat lsb script as well"
>
> File: /etc/init.d/heartbeat
>
>    start)
>          RunStartStop pre-start
>          StartHA
>          RC=$?
>          echo
>          if
>            [ $RC -eq 0 ]
>          then
>            [ ! -d $LOCKDIR ]&&  mkdir -p $LOCKDIR
>            touch $LOCKDIR/$SUBSYS
>          fi
>          RunStartStop post-start $RC
>          ;;
>
>    stop)
>          RunStartStop "pre-stop"
>          StopHA
>          RC=$?
>          echo
>          if
>            [ $RC -eq 0 ]
>          then
>            rm -f $LOCKDIR/$SUBSYS
>          fi
>          RunStartStop post-stop $RC
>          ;;
>
>    restart)
>          sleeptime=`ha_parameter deadtime`
>          StopHA
>          echo
>          echo -n "Waiting to allow resource takeover to complete:"
>          sleep $sleeptime
>          sleep 10 # allow resource takeover to complete (hopefully).
>          echo_success
>          echo
>          StartHA
>          echo
>          ;;
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to