Re: [Linux-HA] stderr reboot

Sam Reidland Thu, 27 May 2010 08:13:05 -0700

There is no realtime clock on the board. It is an embedded system and
may or may not have access to ntp in the field. Some boxes are equipped
with GPS and can get clock from that, but my development box never has
the time set.


No package, everything has been hand installed and on one of the nodes,
hacluster directory didn't get created. We are running kernel 2.6.32
(soon to be 33) built with the DENX ELDK 4.1 running on a PPC 85xx, no
package manager. We had to build all of the HA components ourselves.

lrmd did crash, I will try to get a core file if possible.

Versions:
Pacemaker 1.0.7
Cluster Glue 1.0.3
Heartbeat 3.0.2
Resource Agents 1.0.2

Dejan Muhamedagic wrote:
> Hi,
>
> On Wed, May 26, 2010 at 11:05:05AM -0500, Sam Reidland wrote:
>   
>> I have been working on a simple 2 node 2 resource cluster using
>> Pacemaker 1.0.7 and heartbeat 3.0.2. The two resources are IPaddr and
>> our application. When our application was started, the box would reboot
>> (actually a clean restart). After a lot of searching I found that if I
>> didn't initialize net-SNMP, everything started perfectly. The build of
>> net-SNMP we use spits 2 or 3 lines to stderr when it starts and I
>> noticed that the reboot occurred after the first line to stderr was
>> printed and no other output was seen after that. My OCF script started
>> our app with the following command '/BACKHAUL/bhApplication >/dev/null
>> &'. I changed the command to '/BACKHAUL/bhApplication &>/dev/null &' and
>> everything works as it should. So the question is, why does the HA
>> software cause the box to reboot when something is sent to stderr?
>>     
>
> Normally it shouldn't. Actually, whatever is caught on stderr
> gets logged by lrmd.
>
> Your clock is obviously not set. You should use ntp to sync
> clocks on all nodes.
>
> There's something wrong with your installation, i.e. some
> directories are missing:
>
>   
>> Jan  1 00:14:23 bh130 daemon.err crmd: [1148]: ERROR: crm_log_init:
>> Cannot change active directory to
>> /usr/var/lib/heartbeat/cores/hacluster: No such file or directory (2)
>>     
>
> Where did you get the packages?
>
> You should start ha_logd. Or otherwise fix the logging setup.
>
>   
>> Jan  1 00:14:07 bh130 daemon.warn ccm: [1143]: WARN: Initializing
>> connection to logging daemon failed. Logging daemon may not be running
>>     
>
>   
>> I'm
>> not even sure what part caused the box to reboot.
>>     
>
> You probably have "crm on" in ha.cf. If one of the subsystems
> leaves, it's considered as a reason to reboot. You can use "crm
> respawn" to prevent reboots.
>
>   
>> I have included the log from a session in which the box rebooted.
>>     
>
> Better to attach logs instead of pasting.
>
>   
> lrmd crashed. Did you find any coredumps? If so, please provide
> backtrace. If not, then enable coredumps, reproduce, and file a
> bugzilla with hb_report. And which cluster-glue version?
>
> Thanks,
>
> Dejan
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
>   
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] stderr reboot

Reply via email to