it looks very strange. the UUID of node1 is set correct (the same like is set 
in cib.xml).
is this ID stored in /var/lib/heartbeat/hb_uuid?


zentgpfsn01:~ # crm_uuid
07ca44ca-1bf5-4f12-8680-21f86c2e6bca

zentgpfsn01:~ # grep zentgpfsn01 /var/lib/heartbeat/crm/cib.xml
     <node uname="zentgpfsn01" type="normal" 
id="07ca44ca-1bf5-4f12-8680-21f86c2e6bca">



in /var/lib/heartbeat/hostcache also the right UUqIDs are set:

zentgpfsn01:~ # less /var/lib/heartbeat/hostcache
zentgpfsn01     07ca44ca-1bf5-4f12-8680-21f86c2e6bca    100
zentgpfsn02     f44cbb3e-fa3c-4f93-b433-0c9eb4bb5cba    100
zentgpfsn03     7aa4698a-a17a-4c5b-8cfe-f7226a21aee8    100


when I start heartbeat on node1, /var/lib/heartbeat/hostcache looks like 
following

zentgpfsn01:~ # less /var/lib/heartbeat/hostcache
zentgpfsn01     07ca44ca-1bf5-4f12-8680-21f86c2e6bca    100     
zentgpfsn02     f44cbb3e-fa3c-4f93-b433-0c9eb4bb5cba    100     
zentgpfsn03     7aa4698a-a17a-4c5b-8cfe-f7226a21aee8    100     
zentgpfsn01     00000000-0000-0000-0000-000000000000    100


it seems as node1 can not found its own UUID and start w/o one. that may the 
reason for entries in logfile and the reboot:

ccm[19837]: 2008/04/30_09:02:05 ERROR: llm_add: adding same node(zentgpfsn01) 
twice(?)
ccm[19837]: 2008/04/30_09:02:05 ERROR: set_llm_from_heartbeat: adding node 
zentgpfsn01 to llm failed
ccm[19837]: 2008/04/30_09:02:05 ERROR: Initialization failed. Exit
heartbeat[19822]: 2008/04/30_09:02:05 WARN: Managed /usr/lib64/heartbeat/ccm 
process 19837 exited with return code 1.
heartbeat[19822]: 2008/04/30_09:02:05 EMERG: Rebooting system.  Reason: 
/usr/lib64/heartbeat/ccm



but when the system is up again, the same UUID is still set:

zentgpfsn01:~ # crm_uuid
07ca44ca-1bf5-4f12-8680-21f86c2e6bca



anybody an idea? i'm a bit helpless....


Dominik Klein schrieb:
here is the cause:

 ccm[19751]: 2008/04/29_10:59:59 ERROR: llm_add: adding same
node(zentgpfsn01) twice(?)
ccm[19751]: 2008/04/29_10:59:59 ERROR: set_llm_from_heartbeat: adding node
zentgpfsn01 to llm failed
 ccm[19751]: 2008/04/29_10:59:59 ERROR: Initialization failed. Exit
 heartbeat[19737]: 2008/04/29_10:59:59 WARN: Managed
/usr/lib64/heartbeat/ccm process 19751 exited with return code 1.

you don't have more than one machine with the same name by any chance?

I saw something like this when I recently re-installed my testcluster and used an old (backup) configuration file. The uuid changed and so every node was there twice which ended in quite a mess.

Regards
Dominik
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



--
Mit besten Grüßen / Best Regards

Alexander Födisch

Max Planck Institute for Evolutionary Anthropology
-Central IT Department-
Deutscher Platz 6
D-04103 Leipzig

Phone:  +49 (0)341 3550-168
        +49 (0)341 3550-154
Fax:    +49 (0)341 3550-119
Email:  [EMAIL PROTECTED]

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to