RE: [Linux-HA] Node reboots when starting heartbeat

Junko IKEDA Wed, 30 Apr 2008 02:28:17 -0700

> it looks very strange. the UUID of node1 is set correct (the same like is
set
> in cib.xml).
> is this ID stored in /var/lib/heartbeat/hb_uuid?
> 
> 
> zentgpfsn01:~ # crm_uuid
> 07ca44ca-1bf5-4f12-8680-21f86c2e6bca
> 
> zentgpfsn01:~ # grep zentgpfsn01 /var/lib/heartbeat/crm/cib.xml
>       <node uname="zentgpfsn01" type="normal"
> id="07ca44ca-1bf5-4f12-8680-21f86c2e6bca">
> 
> 
> 
> in /var/lib/heartbeat/hostcache also the right UUqIDs are set:
> 
> zentgpfsn01:~ # less /var/lib/heartbeat/hostcache
> zentgpfsn01     07ca44ca-1bf5-4f12-8680-21f86c2e6bca    100
> zentgpfsn02     f44cbb3e-fa3c-4f93-b433-0c9eb4bb5cba    100
> zentgpfsn03     7aa4698a-a17a-4c5b-8cfe-f7226a21aee8    100
> 
> 
> when I start heartbeat on node1, /var/lib/heartbeat/hostcache looks like
> following


Hi,
Could you try to remove "/var/lib/heartbeat/hostcache" before starting
Heartbeat as Andrew says?
It might be needed for all nodes.
I think I encountered the similar error when I tried to replace some nodes.
At that time, hb_delnode command, or remove hostcash was effective.

Thanks,
Junko

> 
> zentgpfsn01:~ # less /var/lib/heartbeat/hostcache
> zentgpfsn01   07ca44ca-1bf5-4f12-8680-21f86c2e6bca    100
> zentgpfsn02   f44cbb3e-fa3c-4f93-b433-0c9eb4bb5cba    100
> zentgpfsn03   7aa4698a-a17a-4c5b-8cfe-f7226a21aee8    100
> zentgpfsn01   00000000-0000-0000-0000-000000000000    100
> 
> 
> it seems as node1 can not found its own UUID and start w/o one. that may
the
> reason for entries in logfile and the reboot:
> 
> ccm[19837]: 2008/04/30_09:02:05 ERROR: llm_add: adding same
node(zentgpfsn01)
> twice(?)
> ccm[19837]: 2008/04/30_09:02:05 ERROR: set_llm_from_heartbeat: adding node
> zentgpfsn01 to llm failed
> ccm[19837]: 2008/04/30_09:02:05 ERROR: Initialization failed. Exit
> heartbeat[19822]: 2008/04/30_09:02:05 WARN: Managed
/usr/lib64/heartbeat/ccm
> process 19837 exited with return code 1.
> heartbeat[19822]: 2008/04/30_09:02:05 EMERG: Rebooting system.  Reason:
> /usr/lib64/heartbeat/ccm
> 
> 
> 
> but when the system is up again, the same UUID is still set:
> 
> zentgpfsn01:~ # crm_uuid
> 07ca44ca-1bf5-4f12-8680-21f86c2e6bca
> 
> 
> 
> anybody an idea? i'm a bit helpless....
> 
> 
> Dominik Klein schrieb:
> >> here is the cause:
> >>
> >>>  ccm[19751]: 2008/04/29_10:59:59 ERROR: llm_add: adding same
> >>> node(zentgpfsn01) twice(?)
> >>>  ccm[19751]: 2008/04/29_10:59:59 ERROR: set_llm_from_heartbeat:
> >>> adding node
> >>> zentgpfsn01 to llm failed
> >>>  ccm[19751]: 2008/04/29_10:59:59 ERROR: Initialization failed. Exit
> >>>  heartbeat[19737]: 2008/04/29_10:59:59 WARN: Managed
> >>> /usr/lib64/heartbeat/ccm process 19751 exited with return code 1.
> >>
> >> you don't have more than one machine with the same name by any chance?
> >
> > I saw something like this when I recently re-installed my testcluster
> > and used an old (backup) configuration file. The uuid changed and so
> > every node was there twice which ended in quite a mess.
> >
> > Regards
> > Dominik
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> >
> 
> --
> Mit besten Grüßen / Best Regards
> 
> Alexander Födisch
> 
> Max Planck Institute for Evolutionary Anthropology
> -Central IT Department-
> Deutscher Platz 6
> D-04103 Leipzig
> 
> Phone:  +49 (0)341 3550-168
>       +49 (0)341 3550-154
> Fax:    +49 (0)341 3550-119
> Email:  [EMAIL PROTECTED]


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

RE: [Linux-HA] Node reboots when starting heartbeat

Reply via email to