SDB has a connection to pacemaker to establish overall cluster health (the -P flag). This seems to be where the problem is. I just don't know what the problem might be.
On 23/04/14 11:32 AM, emmanuel segura wrote: > what do you mean with link? > > > 2014-04-23 15:23 GMT+02:00 Tom Parker <[email protected]>: > >> ok. I have fixed that to be no_path_retry fail but I don't think this >> has anything to do with the errors I am seeing. >> >> They seem to be related to sbd's link with my cluster, not with disk I/O >> >> Tom >> >> On 23/04/14 03:11 AM, emmanuel segura wrote: >>> the first thing, you are using no_path_retry in wrong way in your >>> multipath, try to read this >>> http://www.novell.com/documentation/oes2/clus_admin_lx/data/bl9ykz6.html >>> >>> >>> 2014-04-22 20:41 GMT+02:00 Tom Parker <[email protected]>: >>> >>>> I have attached the config files to this e-mail. The sbd dump is below >>>> >>>> [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump >>>> ==Dumping header on disk /dev/mapper/qa-xen-sbd >>>> Header version : 2.1 >>>> UUID : ae835596-3d26-4681-ba40-206b4d51149b >>>> Number of slots : 255 >>>> Sector size : 512 >>>> Timeout (watchdog) : 45 >>>> Timeout (allocate) : 2 >>>> Timeout (loop) : 1 >>>> Timeout (msgwait) : 90 >>>> ==Header on disk /dev/mapper/qa-xen-sbd is dumped >>>> >>>> On 22/04/14 02:30 PM, emmanuel segura wrote: >>>>> you are missingo cluster configuration and sbd configuration and >>>> multipath >>>>> config >>>>> >>>>> >>>>> 2014-04-22 20:21 GMT+02:00 Tom Parker <[email protected]>: >>>>> >>>>>> Has anyone seen this? Do you know what might be causing the flapping? >>>>>> >>>>>> Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled. >>>>>> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device >>>>>> /dev/mapper/qa-xen-sbd >>>>>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health >>>>>> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device >> /dev/mapper/qa-xen-sbd >>>>>> uuid: ae835596-3d26-4681-ba40-206b4d51149b >>>>>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, >> AIS >>>>>> quorum check enabled >>>>>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with >>>>>> cluster ... >>>>>> Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device: >>>>>> /dev/watchdog >>>>>> Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45 >>>>>> seconds. >>>>>> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with >>>>>> cluster ... >>>>>> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right >> now. >>>>>> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN >>>>>> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >>>>>> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending >>>>>> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >>>>>> Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >>>>>> UNHEALTHY >>>>>> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >>>>>> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >>>>>> Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >>>>>> UNHEALTHY >>>>>> Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >>>>>> Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >>>>>> Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >>>>>> UNHEALTHY >>>>>> Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >>>>>> Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >>>>>> Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >>>>>> UNHEALTHY >>>>>> Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >>>>>> Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >>>>>> Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >>>>>> UNHEALTHY >>>>>> Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >>>>>> Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >>>>>> Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >>>>>> UNHEALTHY >>>>>> Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >>>>>> Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >>>>>> Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >>>>>> UNHEALTHY >>>>>> Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >>>>>> Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >>>>>> Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >>>>>> UNHEALTHY >>>>>> Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >>>>>> Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >>>>>> Apr 22 13:09:12 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >>>>>> UNHEALTHY >>>>>> Apr 22 13:09:35 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 22 13:09:35 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >>>>>> Apr 22 13:31:35 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >>>>>> Apr 22 13:31:35 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >>>>>> UNHEALTHY >>>>>> Apr 22 13:31:44 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 22 13:31:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >>>>>> Apr 22 13:32:52 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >>>>>> Apr 22 13:32:52 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >>>>>> UNHEALTHY >>>>>> Apr 22 13:33:01 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 22 13:33:01 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >>>>>> Apr 22 13:44:39 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >>>>>> Apr 22 13:44:39 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >>>>>> UNHEALTHY >>>>>> Apr 22 13:44:47 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 22 13:44:47 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >>>>>> Apr 22 14:07:42 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >>>>>> Apr 22 14:07:42 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >>>>>> UNHEALTHY >>>>>> Apr 22 14:07:51 qaxen6 sbd: [12974]: info: Node state: online >>>>>> Apr 22 14:07:51 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >>>>>> >>>>>> _______________________________________________ >>>>>> Linux-HA mailing list >>>>>> [email protected] >>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>> >>>> _______________________________________________ >>>> Linux-HA mailing list >>>> [email protected] >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>> See also: http://linux-ha.org/ReportingProblems >>>> >>> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
