On Mon, Dec 27, 2010 at 08:02:12PM +0800, Bin Chen(sunwen_ling) wrote: > On Mon, Dec 27, 2010 at 7:14 PM, Dejan Muhamedagic <[email protected]>wrote: > > > Hi, > > > > On Mon, Dec 27, 2010 at 04:27:14PM +0800, Bin Chen(sunwen_ling) wrote: > > > Hi guys, > > > > > > I installed linux heartbeat into one machine, the problem is after we > > > started the heartbeat for several seconds, the machine is rebooted, I can > > > see the problem is the configuration cib.xml is not valid, but is it a > > right > > > behavior that the machine with invalid cib.xml will be reset? Btw the > > > STONITH is disabled, attached the log. I am also wondering if the > > behavior > > > is right, can I disable it to reset as we are with a server machine, the > > > reboot process is painful slow. > > > > I guess that this is because of the "crm yes" directive in ha.cf. > > You can change it to "crm respawn". > > > > Thanks, > > > > Dejan > > > > Thanks Dejan, can you please explain why this option will cause the machine > be reset?
On critical process exiting it is safer to restart the node. Thanks, Dejan > Bin > > > > > > Thanks. > > > Bin > > > > > > Dec 27 23:40:57 ucs22 lrmd: [4993]: WARN: Initializing connection to > > logging > > > daemon failed. Logging daemon may not be running > > > Dec 27 23:40:57 ucs22 lrmd: [4993]: info: G_main_add_SignalHandler: Added > > > signal handler for signal 15 > > > Dec 27 23:40:57 ucs22 lrmd: [4993]: info: G_main_add_SignalHandler: Added > > > signal handler for signal 17 > > > Dec 27 23:40:57 ucs22 lrmd: [4993]: info: enabling coredumps > > > Dec 27 23:40:57 ucs22 lrmd: [4993]: info: G_main_add_SignalHandler: Added > > > signal handler for signal 10 > > > Dec 27 23:40:57 ucs22 lrmd: [4993]: info: G_main_add_SignalHandler: Added > > > signal handler for signal 12 > > > Dec 27 23:40:57 ucs22 lrmd: [4993]: info: Started. > > > Dec 27 23:40:57 ucs22 stonithd: [4994]: WARN: Initializing connection to > > > logging daemon failed. Logging daemon may not be running > > > Dec 27 23:40:57 ucs22 stonithd: [4994]: info: G_main_add_SignalHandler: > > > Added signal handler for signal 10 > > > Dec 27 23:40:57 ucs22 stonithd: [4994]: info: G_main_add_SignalHandler: > > > Added signal handler for signal 12 > > > Dec 27 23:40:57 ucs22 attrd: [4995]: info: crm_log_init: Changed active > > > directory to /var/lib/heartbeat/cores/hacluster > > > Dec 27 23:40:57 ucs22 attrd: [4995]: WARN: Initializing connection to > > > logging daemon failed. Logging daemon may not be running > > > Dec 27 23:40:57 ucs22 attrd: [4995]: info: Invoked: > > > /usr/lib64/heartbeat/attrd > > > Dec 27 23:40:57 ucs22 cib: [4992]: info: crm_log_init: Changed active > > > directory to /var/lib/heartbeat/cores/hacluster > > > Dec 27 23:40:57 ucs22 crmd: [4996]: info: crm_log_init: Changed active > > > directory to /var/lib/heartbeat/cores/hacluster > > > Dec 27 23:40:57 ucs22 ccm: [4991]: WARN: Initializing connection to > > logging > > > daemon failed. Logging daemon may not be running > > > Dec 27 23:40:57 ucs22 stonithd: [4994]: info: register_heartbeat_conn: > > > Hostname: ucs22 > > > Dec 27 23:40:57 ucs22 attrd: [4995]: info: main: Starting up > > > Dec 27 23:40:57 ucs22 cib: [4992]: WARN: Initializing connection to > > logging > > > daemon failed. Logging daemon may not be running > > > Dec 27 23:40:57 ucs22 crmd: [4996]: WARN: Initializing connection to > > logging > > > daemon failed. Logging daemon may not be running > > > Dec 27 23:40:57 ucs22 ccm: [4991]: info: Hostname: ucs22 > > > Dec 27 23:40:57 ucs22 stonithd: [4994]: info: register_heartbeat_conn: > > UUID: > > > b8fc4074-c40e-48e4-80ad-a9b63fd4bf77 > > > Dec 27 23:40:57 ucs22 cib: [4992]: info: Invoked: > > /usr/lib64/heartbeat/cib > > > Dec 27 23:40:57 ucs22 crmd: [4996]: info: Invoked: > > /usr/lib64/heartbeat/crmd > > > > > > Dec 27 23:40:57 ucs22 stonithd: [4994]: info: crm_cluster_connect: > > > Connecting to Heartbeat > > > Dec 27 23:40:57 ucs22 cib: [4992]: info: G_main_add_TriggerHandler: Added > > > signal manual handler > > > Dec 27 23:40:57 ucs22 crmd: [4996]: info: main: CRM Hg Version: > > > da7075976b5ff0bee71074385f8fd02f296ec8a3 > > > Dec 27 23:40:57 ucs22 cib: [4992]: info: G_main_add_SignalHandler: Added > > > signal handler for signal 17 > > > Dec 27 23:40:57 ucs22 crmd: [4996]: info: crmd_init: Starting crmd > > > Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: crm_is_writable: > > > /var/lib/heartbeat/crm/cib.xml must be owned and r/w by user hacluster > > > Dec 27 23:40:57 ucs22 crmd: [4996]: info: G_main_add_SignalHandler: Added > > > signal handler for signal 17 > > > Dec 27 23:40:57 ucs22 cib: [4992]: info: retrieveCib: Reading cluster > > > configuration from: /var/lib/heartbeat/crm/cib.xml (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig) > > > Dec 27 23:40:57 ucs22 cib: [4992]: WARN: validate_cib_digest: No on-disk > > > digest present > > > Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: Expecting an element nodes, got > > > nothing > > > Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: Invalid sequence in interleave > > > Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: Element configuration failed to > > > validate content > > > Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: Element cib failed to validate > > > content > > > Dec 27 23:40:57 ucs22 cib: [4992]: ERROR: readCibXmlFile: CIB does not > > > validate with pacemaker-1.0 > > > Dec 27 23:40:57 ucs22 cib: [4992]: info: startCib: CIB Initialization > > > completed successfully > > > Dec 27 23:40:57 ucs22 attrd: [4995]: info: register_heartbeat_conn: > > > Hostname: ucs22 > > > Dec 27 23:40:57 ucs22 attrd: [4995]: info: register_heartbeat_conn: UUID: > > > b8fc4074-c40e-48e4-80ad-a9b63fd4bf77 > > > Dec 27 23:40:57 ucs22 attrd: [4995]: info: crm_cluster_connect: > > Connecting > > > to Heartbeat > > > Dec 27 23:40:57 ucs22 attrd: [4995]: info: main: Cluster connection > > active > > > Dec 27 23:40:57 ucs22 attrd: [4995]: info: main: Accepting attribute > > updates > > > Dec 27 23:40:57 ucs22 attrd: [4995]: info: main: Starting mainloop... > > > Dec 27 23:40:57 ucs22 stonithd: [4994]: notice: > > > /usr/lib64/heartbeat/stonithd start up successfully. > > > Dec 27 23:40:57 ucs22 stonithd: [4994]: info: G_main_add_SignalHandler: > > > Added signal handler for signal 17 > > > Dec 27 23:40:57 ucs22 cib: [4992]: info: register_heartbeat_conn: > > Hostname: > > > ucs22 > > > Dec 27 23:40:57 ucs22 cib: [4992]: info: register_heartbeat_conn: UUID: > > > b8fc4074-c40e-48e4-80ad-a9b63fd4bf77 > > > Dec 27 23:40:57 ucs22 cib: [4992]: info: crm_cluster_connect: Connecting > > to > > > Heartbeat > > > Dec 27 23:40:57 ucs22 cib: [4992]: info: ccm_connect: Registering with > > > CCM... > > > Dec 27 23:40:57 ucs22 cib: [4992]: WARN: ccm_connect: CCM Activation > > failed > > > Dec 27 23:40:57 ucs22 cib: [4992]: WARN: ccm_connect: CCM Connection > > failed > > > 1 times (30 max) > > > Dec 27 23:40:58 ucs22 crmd: [4996]: info: do_cib_control: Could not > > connect > > > to the CIB service: connection failed > > > Dec 27 23:40:58 ucs22 crmd: [4996]: WARN: do_cib_control: Couldn't > > complete > > > CIB registration 1 times... pause and retry > > > Dec 27 23:40:58 ucs22 crmd: [4996]: info: crmd_init: Starting crmd's > > > mainloop > > > Dec 27 23:40:58 ucs22 ccm: [4991]: info: G_main_add_SignalHandler: Added > > > signal handler for signal 15 > > > Dec 27 23:41:00 ucs22 crmd: [4996]: info: crm_timer_popped: Wait Timer > > > (I_NULL) just popped! > > > Dec 27 23:41:00 ucs22 cib: [4992]: info: ccm_connect: Registering with > > > CCM... > > > Dec 27 23:41:00 ucs22 cib: [4992]: info: cib_init: Requesting the list of > > > configured nodes > > > Dec 27 23:41:00 ucs22 cib: [4992]: info: cib_init: Starting cib mainloop > > > Dec 27 23:41:00 ucs22 cib: [4992]: info: cib_client_status_callback: > > Status > > > update: Client ucs22/cib now has status [join] > > > Dec 27 23:41:00 ucs22 cib: [4992]: info: crm_new_peer: Node 0 is now > > known > > > as ucs22 > > > Dec 27 23:41:00 ucs22 cib: [4992]: info: crm_update_peer_proc: ucs22.cib > > is > > > now online > > > Dec 27 23:41:01 ucs22 cib: [4992]: info: cib_client_status_callback: > > Status > > > update: Client ucs22/cib now has status [online] > > > Dec 27 23:41:01 ucs22 crmd: [4996]: info: do_cib_control: CIB connection > > > established > > > Dec 27 23:41:01 ucs22 cib: [4992]: ERROR: cib_process_request: Operation > > > ignored, cluster configuration is invalid. Please repair and restart: > > Update > > > does not conform to the configured schema/DTD > > > Dec 27 23:41:01 ucs22 cib: [4992]: info: cib_client_status_callback: > > Status > > > update: Client ucs26/cib now has status [online] > > > Dec 27 23:41:01 ucs22 cib: [4992]: info: crm_new_peer: Node 0 is now > > known > > > as ucs26 > > > Dec 27 23:41:01 ucs22 cib: [4992]: info: crm_update_peer_proc: ucs26.cib > > is > > > now online > > > Dec 27 23:41:01 ucs22 crmd: [4996]: info: register_heartbeat_conn: > > Hostname: > > > ucs22 > > > Dec 27 23:41:01 ucs22 crmd: [4996]: info: register_heartbeat_conn: UUID: > > > b8fc4074-c40e-48e4-80ad-a9b63fd4bf77 > > > Dec 27 23:41:01 ucs22 crmd: [4996]: info: crm_cluster_connect: Connecting > > to > > > Heartbeat > > > Dec 27 23:41:02 ucs22 crmd: [4996]: info: do_ha_control: Connected to the > > > cluster > > > Dec 27 23:41:02 ucs22 crmd: [4996]: info: do_ccm_control: CCM connection > > > established... waiting for first callback > > > Dec 27 23:41:02 ucs22 crmd: [4996]: info: do_started: Delaying start, CCM > > > (0000000000100000) not connected > > > Dec 27 23:41:02 ucs22 cib: [4992]: ERROR: cib_process_request: Operation > > > ignored, cluster configuration is invalid. Please repair and restart: > > Update > > > does not conform to the configured schema/DTD > > > Dec 27 23:41:02 ucs22 crmd: [4996]: ERROR: config_query_callback: Local > > CIB > > > query resulted in an error: Update does not conform to the configured > > > schema/DTD > > > Dec 27 23:41:02 ucs22 crmd: [4996]: ERROR: config_query_callback: The > > > cluster is mis-configured - shutting down and staying down > > > Dec 27 23:41:02 ucs22 crmd: [4996]: notice: crmd_client_status_callback: > > > Status update: Client ucs22/crmd now has status [online] (DC=false) > > > Dec 27 23:41:02 ucs22 attrd: [4995]: info: cib_connect: Connected to the > > CIB > > > after 1 signon attempts > > > Dec 27 23:41:02 ucs22 attrd: [4995]: info: cib_connect: Sending full > > refresh > > > Dec 27 23:41:02 ucs22 crmd: [4996]: info: crm_new_peer: Node 0 is now > > known > > > as ucs22 > > > Dec 27 23:41:02 ucs22 crmd: [4996]: info: crm_update_peer_proc: > > ucs22.crmd > > > is now online > > > Dec 27 23:41:02 ucs22 crmd: [4996]: info: crmd_client_status_callback: > > Not > > > the DC > > > Dec 27 23:41:02 ucs22 crmd: [4996]: notice: crmd_client_status_callback: > > > Status update: Client ucs22/crmd now has status [online] (DC=false) > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: crmd_client_status_callback: > > Not > > > the DC > > > Dec 27 23:41:03 ucs22 crmd: [4996]: notice: crmd_client_status_callback: > > > Status update: Client ucs26/crmd now has status [online] (DC=false) > > > Dec 27 23:41:03 ucs22 cib: [4992]: WARN: cib_peer_callback: Discarding > > > cib_apply_diff message (808) from ucs26: not in our membership > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: crm_new_peer: Node 0 is now > > known > > > as ucs26 > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: crm_update_peer_proc: > > ucs26.crmd > > > is now online > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: crmd_client_status_callback: > > Not > > > the DC > > > Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_log: FSA: Input I_ERROR > > from > > > config_query_callback() received in state S_STARTING > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_state_transition: State > > > transition S_STARTING -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL > > > origin=config_query_callback ] > > > Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_recover: Action A_RECOVER > > > (0000000001000000) not supported > > > Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_log: FSA: Input I_ERROR > > from > > > revision_check_callback() received in state S_RECOVERY > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_dc_release: DC role released > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_te_control: Transitioner is > > now > > > inactive > > > Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_started: Start cancelled... > > > S_RECOVERY > > > Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_log: FSA: Input I_TERMINATE > > > from do_recover() received in state S_RECOVERY > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_state_transition: State > > > transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE > > > cause=C_FSA_INTERNAL origin=do_recover ] > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_shutdown: All subsystems > > > stopped, continuing > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_lrm_control: Disconnected > > from > > > the LRM > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_ha_control: Disconnected > > from > > > Heartbeat > > > Dec 27 23:41:03 ucs22 ccm: [4991]: info: client (pid=4996) removed from > > ccm > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_cib_control: Disconnecting > > CIB > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: crmd_cib_connection_destroy: > > > Connection to the CIB terminated... > > > Dec 27 23:41:03 ucs22 cib: [4992]: ERROR: cib_process_request: Operation > > > ignored, cluster configuration is invalid. Please repair and restart: > > Update > > > does not conform to the configured schema/DTD > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_exit: Performing A_EXIT_0 - > > > gracefully exiting the CRMd > > > Dec 27 23:41:03 ucs22 cib: [4992]: WARN: send_ipc_message: IPC Channel to > > > 4996 is not connected > > > Dec 27 23:41:03 ucs22 crmd: [4996]: ERROR: do_exit: Could not recover > > from > > > internal error > > > Dec 27 23:41:03 ucs22 cib: [4992]: WARN: send_via_callback_channel: > > Delivery > > > of reply to client 4996/78d426cb-9410-4d60-99fd-42fa190683c7 failed > > > Dec 27 23:41:03 ucs22 crmd: [4996]: WARN: do_exit: Inhibiting respawn by > > > Heartbeat > > > Dec 27 23:41:03 ucs22 cib: [4992]: WARN: do_local_notify: A-Sync reply to > > > crmd failed: reply failed > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: free_mem: Dropping > > > I_RELEASE_SUCCESS: [ state=S_TERMINATE cause=C_FSA_INTERNAL > > > origin=do_dc_release ] > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: free_mem: Dropping I_TERMINATE: > > [ > > > state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ] > > > Dec 27 23:41:03 ucs22 crmd: [4996]: info: do_exit: [crmd] stopped (100) > > > Dec 27 23:41:04 ucs22 kernel: device eth2 entered promiscuous mode > > > Dec 27 23:41:05 ucs22 kernel: md: stopping all md devices. > > > Dec 27 23:41:06 ucs22 kernel: Synchronizing SCSI cache for disk sdb: > > > Dec 27 23:41:06 ucs22 kernel: Synchronizing SCSI cache for disk sda: > > > Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:12:00.1 > > > disabled > > > Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:12:00.0 > > > disabled > > > Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:08:00.1 > > > disabled > > > Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:08:00.0 > > > disabled > > > Dec 27 23:41:06 ucs22 kernel: usb 8-1: new full speed USB device using > > > uhci_hcd and address 2 > > > Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:05:00.1 > > > disabled > > > Dec 27 23:41:06 ucs22 kernel: usb 8-1: not running at top speed; connect > > to > > > a high speed hub > > > Dec 27 23:41:06 ucs22 kernel: usb 8-1: configuration #1 chosen from 1 > > choice > > > Dec 27 23:41:06 ucs22 kernel: hub 8-1:1.0: USB hub found > > > Dec 27 23:41:06 ucs22 kernel: hub 8-1:1.0: 4 ports detected > > > Dec 27 23:41:06 ucs22 kernel: ACPI: PCI interrupt for device 0000:05:00.0 > > > disabled > > > Dec 27 23:47:35 ucs22 syslogd 1.4.1: restart. > > > _______________________________________________ > > > Linux-HA mailing list > > > [email protected] > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
