Hi Yamauchi-san, On Fri, Apr 11, 2008 at 05:22:37PM +0900, HIDEO YAMAUCHI wrote: > Hi, > > I made the environment that did not turn on Heartbeat with chkconfig. > In this environment, a network stops earlier than Heartbeat.
But it should be the other way around. I would consider this environment broken so you probably did this only for testing purposes. > With these two environmental nodes, I start Heartbeat. > > I carried out shutdown with heartbeat having started in a DC node with the > resource. > But, shutdown is not completed unless time passes for around 2 minutes. > > I more easily confirmed the same situation in the following procedures. > > 1)I start Heartbeat in two nodes. > 2)I confirm that a resource starts in a DC node. > 3)I stop network service in a DC node. > #service network stop > 4)I stop Heartbeat in a DC node. > #service heartbeat stop > 5)To a stop of Heartbeat of the DC node, it takes approximately 2 minutes. > > I want to stop Heartbeat service by shorter time. > Even if a network falls earlier than Heartbeat service.... > > In this stop time to take for a long time, can I change it by the setting of > the parameter of cib? There are three big time gaps where crmd was waiting: tengine[558]: 2008/04/11_16:27:02 info: send_rsc_command: Initiating action 7: prmIpPostgreSQLDB_start_0 on dl380g5d crmd[553]: 2008/04/11_16:27:21 info: handle_shutdown_request: Creating shutdown request for dl380g5c tengine[558]: 2008/04/11_16:28:02 WARN: action_timer_callback: Timer popped (abort_level=1000000, complete=false) tengine[558]: 2008/04/11_16:28:02 WARN: print_elem: Action missed its timeout[Action 7]: In-flight (id: prmIpPostgreSQLDB_start_0, loc: dl380g5d, priority: 0) Here it waited for the start operation to finish. This is a one minute timeout. tengine[558]: 2008/04/11_16:31:02 WARN: global_timer_callback: Timer popped (abort_level=1000000, complete=false) tengine[558]: 2008/04/11_16:31:02 WARN: unconfirmed_actions: Waiting on 1 unconfirmed actions Again it waited for lrmd. This time for 3 minutes. There are not many messages from the lrmd. Perhaps I should include more. Could you please rerun this with debug set to 1. > Is there the influence when I changed a parameter if I can appoint it in a > parameter? > > * I used 2.1.3 versions in 64 bits environment. > * In addition, this problem does not seem to happen very much > in environment of 32 bits version. That's interesting. I don't see how that could influence anything. > * I attached the log that I took. There's a strange thing there: crmd[553]: 2008/04/11_16:31:02 info: stop_subsystem: Sent -TERM to pengine: [559] logd[31553]: 2008/04/11_16:31:12 debug: logd_term_action: waiting for 0 messages to be read by write process crmd[553]: 2008/04/11_16:31:02 info: do_shutdown: Waiting for subsystems to exit How comes that the time of logd is ten seconds off? Also, if it's not too much trouble, could you please switch to syslog or turn syslogmsgfmt to true. I find it hard to follow all the time two different message formats. Thanks, Dejan > Regards, > Hideo Yamauchi. > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
