Hi guys, I installed linux heartbeat into one machine, the problem is after we started the heartbeat for several seconds, the machine is rebooted, I can see the problem is the configuration cib.xml is not valid, but is it a right behavior that the machine with invalid cib.xml will be reset? Btw the STONITH is disabled, attached the log. I am also wondering if the behavior is right, can I disable it to reset as we are with a server machine, the reboot process is painful slow.
Thanks. Bin Dec 27 23:40:57 ucs22 lrmd: [4993]: WARN: Initializing connection to logging daemon failed. Logging daemon may not be running Dec 27 23:40:57 ucs22 lrmd: [4993]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Dec 27 23:40:57 ucs22 lrmd: [4993]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Dec 27 23:40:57 ucs22 lrmd: [4993]: info: enabling coredumps Dec 27 23:40:57 ucs22 lrmd: [4993]: info: G_main_add_SignalHandler: Added signal handler for signal 10 Dec 27 23:40:57 ucs22 lrmd: [4993]: info: G_main_add_SignalHandler: Added signal handler for signal 12 Dec 27 23:40:57 ucs22 lrmd: [4993]: info: Started. Dec 27 23:40:57 ucs22 stonithd: [4994]: WARN: Initializing connection to logging daemon failed. Logging daemon may not be running Dec 27 23:40:57 ucs22 stonithd: [4994]: info: G_main_add_SignalHandler: Added signal handler for signal 10 Dec 27 23:40:57 ucs22 stonithd: [4994]: info: G_main_add_SignalHandler: Added signal handler for signal 12 Dec 27 23:40:57 ucs22 attrd: [4995]: info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/hacluster Dec 27 23:40:57 ucs22 attrd: [4995]: WARN: Initializing connection to logging daemon failed. Logging daemon may not be running Dec 27 23:40:57 ucs22 attrd: [4995]: info: Invoked: /usr/lib64/heartbeat/attrd Dec 27 23:40:57 ucs22 cib: [4992]: info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/hacluster Dec 27 23:40:57 ucs22 crmd: [4996]: info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/hacluster Dec 27 23:40:57 ucs22 ccm: [4991]: WARN: Initializing connection to logging daemon failed. Logging daemon may not be running Dec 27 23:40:57 ucs22 stonithd: [4994]: info: register_heartbeat_conn: Hostname: ucs22 Dec 27 23:40:57 ucs22 attrd: [4995]: info: main: Starting up Dec 27 23:40:57 ucs22 cib: [4992]: WARN: Initializing connection to logging daemon failed. Logging daemon may not be running Dec 27 23:40:57 ucs22 crmd: [4996]: WARN: Initializing connection to logging daemon failed. Logging daemon may not be running Dec 27 23:40:57 ucs22 ccm: [4991]: info: Hostname: ucs22 Dec 27 23:40:57 ucs22 stonithd: [4994]: info: register_heartbeat_conn: UUID: b8fc4074-c40e-48e4-80ad-a9b63fd4bf77 Dec 27 23:40:57 ucs22 cib: [4992]: info: Invoked: /usr/lib64/heartbeat/cib Dec 27 23:40:57 ucs22 crmd: [4996]: info: Invoked: /usr/lib64/heartbeat/crmd Dec 27 23:40:57 ucs22 stonithd: [4994]: info: crm_cluster_connect: Connecting to Heartbeat Dec 27 23:40:57 ucs22 cib: [4992]: info: G_main_add_TriggerHandler: Added signal manual handler Dec 27 23:40:57 ucs22 crmd: [4996]: info: main: CRM Hg Version: da7075976b5ff0bee71074385f8fd02f296ec8a3 Dec 27 23:40:57 ucs22 cib: [4992]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Dec 27 23:40:57 ucs22 crmd: [4996]: info: crmd_init: Starting crmd Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: crm_is_writable: /var/lib/heartbeat/crm/cib.xml must be owned and r/w by user hacluster Dec 27 23:40:57 ucs22 crmd: [4996]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Dec 27 23:40:57 ucs22 cib: [4992]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/crm/cib.xml.sig) Dec 27 23:40:57 ucs22 cib: [4992]: WARN: validate_cib_digest: No on-disk digest present Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: Expecting an element nodes, got nothing Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: Invalid sequence in interleave Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: Element configuration failed to validate content Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: Element cib failed to validate content Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: readCibXmlFile: CIB does not validate with pacemaker-1.0 Dec 27 23:40:57 ucs22 cib: [4992]: info: startCib: CIB Initialization completed successfully Dec 27 23:40:57 ucs22 attrd: [4995]: info: register_heartbeat_conn: Hostname: ucs22 Dec 27 23:40:57 ucs22 attrd: [4995]: info: register_heartbeat_conn: UUID: b8fc4074-c40e-48e4-80ad-a9b63fd4bf77 Dec 27 23:40:57 ucs22 attrd: [4995]: info: crm_cluster_connect: Connecting to Heartbeat Dec 27 23:40:57 ucs22 attrd: [4995]: info: main: Cluster connection active Dec 27 23:40:57 ucs22 attrd: [4995]: info: main: Accepting attribute updates Dec 27 23:40:57 ucs22 attrd: [4995]: info: main: Starting mainloop... Dec 27 23:40:57 ucs22 stonithd: [4994]: notice: /usr/lib64/heartbeat/stonithd start up successfully. Dec 27 23:40:57 ucs22 stonithd: [4994]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Dec 27 23:40:57 ucs22 cib: [4992]: info: register_heartbeat_conn: Hostname: ucs22 Dec 27 23:40:57 ucs22 cib: [4992]: info: register_heartbeat_conn: UUID: b8fc4074-c40e-48e4-80ad-a9b63fd4bf77 Dec 27 23:40:57 ucs22 cib: [4992]: info: crm_cluster_connect: Connecting to Heartbeat Dec 27 23:40:57 ucs22 cib: [4992]: info: ccm_connect: Registering with CCM... Dec 27 23:40:57 ucs22 cib: [4992]: WARN: ccm_connect: CCM Activation failed Dec 27 23:40:57 ucs22 cib: [4992]: WARN: ccm_connect: CCM Connection failed 1 times (30 max) Dec 27 23:40:58 ucs22 crmd: [4996]: info: do_cib_control: Could not connect to the CIB service: connection failed Dec 27 23:40:58 ucs22 crmd: [4996]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry Dec 27 23:40:58 ucs22 crmd: [4996]: info: crmd_init: Starting crmd's mainloop Dec 27 23:40:58 ucs22 ccm: [4991]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Dec 27 23:41:00 ucs22 crmd: [4996]: info: crm_timer_popped: Wait Timer (I_NULL) just popped! Dec 27 23:41:00 ucs22 cib: [4992]: info: ccm_connect: Registering with CCM... Dec 27 23:41:00 ucs22 cib: [4992]: info: cib_init: Requesting the list of configured nodes Dec 27 23:41:00 ucs22 cib: [4992]: info: cib_init: Starting cib mainloop Dec 27 23:41:00 ucs22 cib: [4992]: info: cib_client_status_callback: Status update: Client ucs22/cib now has status [join] Dec 27 23:41:00 ucs22 cib: [4992]: info: crm_new_peer: Node 0 is now known as ucs22 Dec 27 23:41:00 ucs22 cib: [4992]: info: crm_update_peer_proc: ucs22.cib is now online Dec 27 23:41:01 ucs22 cib: [4992]: info: cib_client_status_callback: Status update: Client ucs22/cib now has status [online] Dec 27 23:41:01 ucs22 crmd: [4996]: info: do_cib_control: CIB connection established Dec 27 23:41:01 ucs22 cib: [4992]: ERROR: cib_process_request: Operation ignored, cluster configuration is invalid. Please repair and restart: Update does not conform to the configured schema/DTD Dec 27 23:41:01 ucs22 cib: [4992]: info: cib_client_status_callback: Status update: Client ucs26/cib now has status [online] Dec 27 23:41:01 ucs22 cib: [4992]: info: crm_new_peer: Node 0 is now known as ucs26 Dec 27 23:41:01 ucs22 cib: [4992]: info: crm_update_peer_proc: ucs26.cib is now online Dec 27 23:41:01 ucs22 crmd: [4996]: info: register_heartbeat_conn: Hostname: ucs22 Dec 27 23:41:01 ucs22 crmd: [4996]: info: register_heartbeat_conn: UUID: b8fc4074-c40e-48e4-80ad-a9b63fd4bf77 Dec 27 23:41:01 ucs22 crmd: [4996]: info: crm_cluster_connect: Connecting to Heartbeat Dec 27 23:41:02 ucs22 crmd: [4996]: info: do_ha_control: Connected to the cluster Dec 27 23:41:02 ucs22 crmd: [4996]: info: do_ccm_control: CCM connection established... waiting for first callback Dec 27 23:41:02 ucs22 crmd: [4996]: info: do_started: Delaying start, CCM (0000000000100000) not connected Dec 27 23:41:02 ucs22 cib: [4992]: ERROR: cib_process_request: Operation ignored, cluster configuration is invalid. Please repair and restart: Update does not conform to the configured schema/DTD Dec 27 23:41:02 ucs22 crmd: [4996]: ERROR: config_query_callback: Local CIB query resulted in an error: Update does not conform to the configured schema/DTD Dec 27 23:41:02 ucs22 crmd: [4996]: ERROR: config_query_callback: The cluster is mis-configured - shutting down and staying down Dec 27 23:41:02 ucs22 crmd: [4996]: notice: crmd_client_status_callback: Status update: Client ucs22/crmd now has status [online] (DC=false) Dec 27 23:41:02 ucs22 attrd: [4995]: info: cib_connect: Connected to the CIB after 1 signon attempts Dec 27 23:41:02 ucs22 attrd: [4995]: info: cib_connect: Sending full refresh Dec 27 23:41:02 ucs22 crmd: [4996]: info: crm_new_peer: Node 0 is now known as ucs22 Dec 27 23:41:02 ucs22 crmd: [4996]: info: crm_update_peer_proc: ucs22.crmd is now online Dec 27 23:41:02 ucs22 crmd: [4996]: info: crmd_client_status_callback: Not the DC Dec 27 23:41:02 ucs22 crmd: [4996]: notice: crmd_client_status_callback: Status update: Client ucs22/crmd now has status [online] (DC=false) Dec 27 23:41:03 ucs22 crmd: [4996]: info: crmd_client_status_callback: Not the DC Dec 27 23:41:03 ucs22 crmd: [4996]: notice: crmd_client_status_callback: Status update: Client ucs26/crmd now has status [online] (DC=false) Dec 27 23:41:03 ucs22 cib: [4992]: WARN: cib_peer_callback: Discarding cib_apply_diff message (808) from ucs26: not in our membership Dec 27 23:41:03 ucs22 crmd: [4996]: info: crm_new_peer: Node 0 is now known as ucs26 Dec 27 23:41:03 ucs22 crmd: [4996]: info: crm_update_peer_proc: ucs26.crmd is now online Dec 27 23:41:03 ucs22 crmd: [4996]: info: crmd_client_status_callback: Not the DC Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_log: FSA: Input I_ERROR from config_query_callback() received in state S_STARTING Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_state_transition: State transition S_STARTING -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=config_query_callback ] Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_recover: Action A_RECOVER (0000000001000000) not supported Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_log: FSA: Input I_ERROR from revision_check_callback() received in state S_RECOVERY Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_dc_release: DC role released Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_te_control: Transitioner is now inactive Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_started: Start cancelled... S_RECOVERY Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_state_transition: State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_recover ] Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_shutdown: All subsystems stopped, continuing Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_lrm_control: Disconnected from the LRM Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_ha_control: Disconnected from Heartbeat Dec 27 23:41:03 ucs22 ccm: [4991]: info: client (pid=4996) removed from ccm Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_cib_control: Disconnecting CIB Dec 27 23:41:03 ucs22 crmd: [4996]: info: crmd_cib_connection_destroy: Connection to the CIB terminated... Dec 27 23:41:03 ucs22 cib: [4992]: ERROR: cib_process_request: Operation ignored, cluster configuration is invalid. Please repair and restart: Update does not conform to the configured schema/DTD Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd Dec 27 23:41:03 ucs22 cib: [4992]: WARN: send_ipc_message: IPC Channel to 4996 is not connected Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_exit: Could not recover from internal error Dec 27 23:41:03 ucs22 cib: [4992]: WARN: send_via_callback_channel: Delivery of reply to client 4996/78d426cb-9410-4d60-99fd-42fa190683c7 failed Dec 27 23:41:03 ucs22 crmd: [4996]: WARN: do_exit: Inhibiting respawn by Heartbeat Dec 27 23:41:03 ucs22 cib: [4992]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed Dec 27 23:41:03 ucs22 crmd: [4996]: info: free_mem: Dropping I_RELEASE_SUCCESS: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_dc_release ] Dec 27 23:41:03 ucs22 crmd: [4996]: info: free_mem: Dropping I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ] Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_exit: [crmd] stopped (100) Dec 27 23:41:04 ucs22 kernel: device eth2 entered promiscuous mode Dec 27 23:41:05 ucs22 kernel: md: stopping all md devices. Dec 27 23:41:06 ucs22 kernel: Synchronizing SCSI cache for disk sdb: Dec 27 23:41:06 ucs22 kernel: Synchronizing SCSI cache for disk sda: Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:12:00.1 disabled Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:12:00.0 disabled Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:08:00.1 disabled Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:08:00.0 disabled Dec 27 23:41:06 ucs22 kernel: usb 8-1: new full speed USB device using uhci_hcd and address 2 Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:05:00.1 disabled Dec 27 23:41:06 ucs22 kernel: usb 8-1: not running at top speed; connect to a high speed hub Dec 27 23:41:06 ucs22 kernel: usb 8-1: configuration #1 chosen from 1 choice Dec 27 23:41:06 ucs22 kernel: hub 8-1:1.0: USB hub found Dec 27 23:41:06 ucs22 kernel: hub 8-1:1.0: 4 ports detected Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:05:00.0 disabled Dec 27 23:47:35 ucs22 syslogd 1.4.1: restart. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
