Hi, On Mon, Dec 27, 2010 at 04:27:14PM +0800, Bin Chen(sunwen_ling) wrote: > Hi guys, > > I installed linux heartbeat into one machine, the problem is after we > started the heartbeat for several seconds, the machine is rebooted, I can > see the problem is the configuration cib.xml is not valid, but is it a right > behavior that the machine with invalid cib.xml will be reset? Btw the > STONITH is disabled, attached the log. I am also wondering if the behavior > is right, can I disable it to reset as we are with a server machine, the > reboot process is painful slow.
I guess that this is because of the "crm yes" directive in ha.cf. You can change it to "crm respawn". Thanks, Dejan > Thanks. > Bin > > Dec 27 23:40:57 ucs22 lrmd: [4993]: WARN: Initializing connection to logging > daemon failed. Logging daemon may not be running > Dec 27 23:40:57 ucs22 lrmd: [4993]: info: G_main_add_SignalHandler: Added > signal handler for signal 15 > Dec 27 23:40:57 ucs22 lrmd: [4993]: info: G_main_add_SignalHandler: Added > signal handler for signal 17 > Dec 27 23:40:57 ucs22 lrmd: [4993]: info: enabling coredumps > Dec 27 23:40:57 ucs22 lrmd: [4993]: info: G_main_add_SignalHandler: Added > signal handler for signal 10 > Dec 27 23:40:57 ucs22 lrmd: [4993]: info: G_main_add_SignalHandler: Added > signal handler for signal 12 > Dec 27 23:40:57 ucs22 lrmd: [4993]: info: Started. > Dec 27 23:40:57 ucs22 stonithd: [4994]: WARN: Initializing connection to > logging daemon failed. Logging daemon may not be running > Dec 27 23:40:57 ucs22 stonithd: [4994]: info: G_main_add_SignalHandler: > Added signal handler for signal 10 > Dec 27 23:40:57 ucs22 stonithd: [4994]: info: G_main_add_SignalHandler: > Added signal handler for signal 12 > Dec 27 23:40:57 ucs22 attrd: [4995]: info: crm_log_init: Changed active > directory to /var/lib/heartbeat/cores/hacluster > Dec 27 23:40:57 ucs22 attrd: [4995]: WARN: Initializing connection to > logging daemon failed. Logging daemon may not be running > Dec 27 23:40:57 ucs22 attrd: [4995]: info: Invoked: > /usr/lib64/heartbeat/attrd > Dec 27 23:40:57 ucs22 cib: [4992]: info: crm_log_init: Changed active > directory to /var/lib/heartbeat/cores/hacluster > Dec 27 23:40:57 ucs22 crmd: [4996]: info: crm_log_init: Changed active > directory to /var/lib/heartbeat/cores/hacluster > Dec 27 23:40:57 ucs22 ccm: [4991]: WARN: Initializing connection to logging > daemon failed. Logging daemon may not be running > Dec 27 23:40:57 ucs22 stonithd: [4994]: info: register_heartbeat_conn: > Hostname: ucs22 > Dec 27 23:40:57 ucs22 attrd: [4995]: info: main: Starting up > Dec 27 23:40:57 ucs22 cib: [4992]: WARN: Initializing connection to logging > daemon failed. Logging daemon may not be running > Dec 27 23:40:57 ucs22 crmd: [4996]: WARN: Initializing connection to logging > daemon failed. Logging daemon may not be running > Dec 27 23:40:57 ucs22 ccm: [4991]: info: Hostname: ucs22 > Dec 27 23:40:57 ucs22 stonithd: [4994]: info: register_heartbeat_conn: UUID: > b8fc4074-c40e-48e4-80ad-a9b63fd4bf77 > Dec 27 23:40:57 ucs22 cib: [4992]: info: Invoked: /usr/lib64/heartbeat/cib > Dec 27 23:40:57 ucs22 crmd: [4996]: info: Invoked: /usr/lib64/heartbeat/crmd > > Dec 27 23:40:57 ucs22 stonithd: [4994]: info: crm_cluster_connect: > Connecting to Heartbeat > Dec 27 23:40:57 ucs22 cib: [4992]: info: G_main_add_TriggerHandler: Added > signal manual handler > Dec 27 23:40:57 ucs22 crmd: [4996]: info: main: CRM Hg Version: > da7075976b5ff0bee71074385f8fd02f296ec8a3 > Dec 27 23:40:57 ucs22 cib: [4992]: info: G_main_add_SignalHandler: Added > signal handler for signal 17 > Dec 27 23:40:57 ucs22 crmd: [4996]: info: crmd_init: Starting crmd > Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: crm_is_writable: > /var/lib/heartbeat/crm/cib.xml must be owned and r/w by user hacluster > Dec 27 23:40:57 ucs22 crmd: [4996]: info: G_main_add_SignalHandler: Added > signal handler for signal 17 > Dec 27 23:40:57 ucs22 cib: [4992]: info: retrieveCib: Reading cluster > configuration from: /var/lib/heartbeat/crm/cib.xml (digest: > /var/lib/heartbeat/crm/cib.xml.sig) > Dec 27 23:40:57 ucs22 cib: [4992]: WARN: validate_cib_digest: No on-disk > digest present > Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: Expecting an element nodes, got > nothing > Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: Invalid sequence in interleave > Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: Element configuration failed to > validate content > Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: Element cib failed to validate > content > Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: readCibXmlFile: CIB does not > validate with pacemaker-1.0 > Dec 27 23:40:57 ucs22 cib: [4992]: info: startCib: CIB Initialization > completed successfully > Dec 27 23:40:57 ucs22 attrd: [4995]: info: register_heartbeat_conn: > Hostname: ucs22 > Dec 27 23:40:57 ucs22 attrd: [4995]: info: register_heartbeat_conn: UUID: > b8fc4074-c40e-48e4-80ad-a9b63fd4bf77 > Dec 27 23:40:57 ucs22 attrd: [4995]: info: crm_cluster_connect: Connecting > to Heartbeat > Dec 27 23:40:57 ucs22 attrd: [4995]: info: main: Cluster connection active > Dec 27 23:40:57 ucs22 attrd: [4995]: info: main: Accepting attribute updates > Dec 27 23:40:57 ucs22 attrd: [4995]: info: main: Starting mainloop... > Dec 27 23:40:57 ucs22 stonithd: [4994]: notice: > /usr/lib64/heartbeat/stonithd start up successfully. > Dec 27 23:40:57 ucs22 stonithd: [4994]: info: G_main_add_SignalHandler: > Added signal handler for signal 17 > Dec 27 23:40:57 ucs22 cib: [4992]: info: register_heartbeat_conn: Hostname: > ucs22 > Dec 27 23:40:57 ucs22 cib: [4992]: info: register_heartbeat_conn: UUID: > b8fc4074-c40e-48e4-80ad-a9b63fd4bf77 > Dec 27 23:40:57 ucs22 cib: [4992]: info: crm_cluster_connect: Connecting to > Heartbeat > Dec 27 23:40:57 ucs22 cib: [4992]: info: ccm_connect: Registering with > CCM... > Dec 27 23:40:57 ucs22 cib: [4992]: WARN: ccm_connect: CCM Activation failed > Dec 27 23:40:57 ucs22 cib: [4992]: WARN: ccm_connect: CCM Connection failed > 1 times (30 max) > Dec 27 23:40:58 ucs22 crmd: [4996]: info: do_cib_control: Could not connect > to the CIB service: connection failed > Dec 27 23:40:58 ucs22 crmd: [4996]: WARN: do_cib_control: Couldn't complete > CIB registration 1 times... pause and retry > Dec 27 23:40:58 ucs22 crmd: [4996]: info: crmd_init: Starting crmd's > mainloop > Dec 27 23:40:58 ucs22 ccm: [4991]: info: G_main_add_SignalHandler: Added > signal handler for signal 15 > Dec 27 23:41:00 ucs22 crmd: [4996]: info: crm_timer_popped: Wait Timer > (I_NULL) just popped! > Dec 27 23:41:00 ucs22 cib: [4992]: info: ccm_connect: Registering with > CCM... > Dec 27 23:41:00 ucs22 cib: [4992]: info: cib_init: Requesting the list of > configured nodes > Dec 27 23:41:00 ucs22 cib: [4992]: info: cib_init: Starting cib mainloop > Dec 27 23:41:00 ucs22 cib: [4992]: info: cib_client_status_callback: Status > update: Client ucs22/cib now has status [join] > Dec 27 23:41:00 ucs22 cib: [4992]: info: crm_new_peer: Node 0 is now known > as ucs22 > Dec 27 23:41:00 ucs22 cib: [4992]: info: crm_update_peer_proc: ucs22.cib is > now online > Dec 27 23:41:01 ucs22 cib: [4992]: info: cib_client_status_callback: Status > update: Client ucs22/cib now has status [online] > Dec 27 23:41:01 ucs22 crmd: [4996]: info: do_cib_control: CIB connection > established > Dec 27 23:41:01 ucs22 cib: [4992]: ERROR: cib_process_request: Operation > ignored, cluster configuration is invalid. Please repair and restart: Update > does not conform to the configured schema/DTD > Dec 27 23:41:01 ucs22 cib: [4992]: info: cib_client_status_callback: Status > update: Client ucs26/cib now has status [online] > Dec 27 23:41:01 ucs22 cib: [4992]: info: crm_new_peer: Node 0 is now known > as ucs26 > Dec 27 23:41:01 ucs22 cib: [4992]: info: crm_update_peer_proc: ucs26.cib is > now online > Dec 27 23:41:01 ucs22 crmd: [4996]: info: register_heartbeat_conn: Hostname: > ucs22 > Dec 27 23:41:01 ucs22 crmd: [4996]: info: register_heartbeat_conn: UUID: > b8fc4074-c40e-48e4-80ad-a9b63fd4bf77 > Dec 27 23:41:01 ucs22 crmd: [4996]: info: crm_cluster_connect: Connecting to > Heartbeat > Dec 27 23:41:02 ucs22 crmd: [4996]: info: do_ha_control: Connected to the > cluster > Dec 27 23:41:02 ucs22 crmd: [4996]: info: do_ccm_control: CCM connection > established... waiting for first callback > Dec 27 23:41:02 ucs22 crmd: [4996]: info: do_started: Delaying start, CCM > (0000000000100000) not connected > Dec 27 23:41:02 ucs22 cib: [4992]: ERROR: cib_process_request: Operation > ignored, cluster configuration is invalid. Please repair and restart: Update > does not conform to the configured schema/DTD > Dec 27 23:41:02 ucs22 crmd: [4996]: ERROR: config_query_callback: Local CIB > query resulted in an error: Update does not conform to the configured > schema/DTD > Dec 27 23:41:02 ucs22 crmd: [4996]: ERROR: config_query_callback: The > cluster is mis-configured - shutting down and staying down > Dec 27 23:41:02 ucs22 crmd: [4996]: notice: crmd_client_status_callback: > Status update: Client ucs22/crmd now has status [online] (DC=false) > Dec 27 23:41:02 ucs22 attrd: [4995]: info: cib_connect: Connected to the CIB > after 1 signon attempts > Dec 27 23:41:02 ucs22 attrd: [4995]: info: cib_connect: Sending full refresh > Dec 27 23:41:02 ucs22 crmd: [4996]: info: crm_new_peer: Node 0 is now known > as ucs22 > Dec 27 23:41:02 ucs22 crmd: [4996]: info: crm_update_peer_proc: ucs22.crmd > is now online > Dec 27 23:41:02 ucs22 crmd: [4996]: info: crmd_client_status_callback: Not > the DC > Dec 27 23:41:02 ucs22 crmd: [4996]: notice: crmd_client_status_callback: > Status update: Client ucs22/crmd now has status [online] (DC=false) > Dec 27 23:41:03 ucs22 crmd: [4996]: info: crmd_client_status_callback: Not > the DC > Dec 27 23:41:03 ucs22 crmd: [4996]: notice: crmd_client_status_callback: > Status update: Client ucs26/crmd now has status [online] (DC=false) > Dec 27 23:41:03 ucs22 cib: [4992]: WARN: cib_peer_callback: Discarding > cib_apply_diff message (808) from ucs26: not in our membership > Dec 27 23:41:03 ucs22 crmd: [4996]: info: crm_new_peer: Node 0 is now known > as ucs26 > Dec 27 23:41:03 ucs22 crmd: [4996]: info: crm_update_peer_proc: ucs26.crmd > is now online > Dec 27 23:41:03 ucs22 crmd: [4996]: info: crmd_client_status_callback: Not > the DC > Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_log: FSA: Input I_ERROR from > config_query_callback() received in state S_STARTING > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_state_transition: State > transition S_STARTING -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL > origin=config_query_callback ] > Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_recover: Action A_RECOVER > (0000000001000000) not supported > Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_log: FSA: Input I_ERROR from > revision_check_callback() received in state S_RECOVERY > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_dc_release: DC role released > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_te_control: Transitioner is now > inactive > Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_started: Start cancelled... > S_RECOVERY > Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_log: FSA: Input I_TERMINATE > from do_recover() received in state S_RECOVERY > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_state_transition: State > transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE > cause=C_FSA_INTERNAL origin=do_recover ] > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_shutdown: All subsystems > stopped, continuing > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_lrm_control: Disconnected from > the LRM > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_ha_control: Disconnected from > Heartbeat > Dec 27 23:41:03 ucs22 ccm: [4991]: info: client (pid=4996) removed from ccm > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_cib_control: Disconnecting CIB > Dec 27 23:41:03 ucs22 crmd: [4996]: info: crmd_cib_connection_destroy: > Connection to the CIB terminated... > Dec 27 23:41:03 ucs22 cib: [4992]: ERROR: cib_process_request: Operation > ignored, cluster configuration is invalid. Please repair and restart: Update > does not conform to the configured schema/DTD > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_exit: Performing A_EXIT_0 - > gracefully exiting the CRMd > Dec 27 23:41:03 ucs22 cib: [4992]: WARN: send_ipc_message: IPC Channel to > 4996 is not connected > Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_exit: Could not recover from > internal error > Dec 27 23:41:03 ucs22 cib: [4992]: WARN: send_via_callback_channel: Delivery > of reply to client 4996/78d426cb-9410-4d60-99fd-42fa190683c7 failed > Dec 27 23:41:03 ucs22 crmd: [4996]: WARN: do_exit: Inhibiting respawn by > Heartbeat > Dec 27 23:41:03 ucs22 cib: [4992]: WARN: do_local_notify: A-Sync reply to > crmd failed: reply failed > Dec 27 23:41:03 ucs22 crmd: [4996]: info: free_mem: Dropping > I_RELEASE_SUCCESS: [ state=S_TERMINATE cause=C_FSA_INTERNAL > origin=do_dc_release ] > Dec 27 23:41:03 ucs22 crmd: [4996]: info: free_mem: Dropping I_TERMINATE: [ > state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ] > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_exit: [crmd] stopped (100) > Dec 27 23:41:04 ucs22 kernel: device eth2 entered promiscuous mode > Dec 27 23:41:05 ucs22 kernel: md: stopping all md devices. > Dec 27 23:41:06 ucs22 kernel: Synchronizing SCSI cache for disk sdb: > Dec 27 23:41:06 ucs22 kernel: Synchronizing SCSI cache for disk sda: > Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:12:00.1 > disabled > Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:12:00.0 > disabled > Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:08:00.1 > disabled > Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:08:00.0 > disabled > Dec 27 23:41:06 ucs22 kernel: usb 8-1: new full speed USB device using > uhci_hcd and address 2 > Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:05:00.1 > disabled > Dec 27 23:41:06 ucs22 kernel: usb 8-1: not running at top speed; connect to > a high speed hub > Dec 27 23:41:06 ucs22 kernel: usb 8-1: configuration #1 chosen from 1 choice > Dec 27 23:41:06 ucs22 kernel: hub 8-1:1.0: USB hub found > Dec 27 23:41:06 ucs22 kernel: hub 8-1:1.0: 4 ports detected > Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:05:00.0 > disabled > Dec 27 23:47:35 ucs22 syslogd 1.4.1: restart. > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
