It seems to me that adding a configuration timout indicating how long to wait 
before allowing promoting is required, possibly indefinitely by def.
I understand why you might want to wait for either primary up again or manual 
recovery.
However, in active stand two node setup with the system req. to be up ALL the 
time there is another approach.
Promote old secondary after a timeout.
If old primary was down for long time - we are up quickly  and old primary 
should sync - fine.
If old primary was down shortly but beyond timeout, SB handlers should recover, 
possibly with manual recovery.
Acceptable since we couldnt wait forever

What say you?
Oren

> Date: Thu, 19 Jan 2012 23:15:00 +0100
> From: [email protected]
> To: [email protected]
> Subject: Re: [DRBD-user] Promote fails in state = { cs:WFConnection 
> ro:Secondary/Unknown ds:Consistent/DUnknown r--- }
> 
> On Thu, Jan 19, 2012 at 11:52:03AM +0000, Oren Nechushtan wrote:
> > 
> > 
> > 
> > 
> > Hi everyone,
> > First, I would like to express my pleasure using DRBD!
> > Here is my situation:
> >  
> > Two-node setup, using cman and pacemaker, don't care about quorum, no 
> > stonithMaster-Slave DRBD resource
> > Fence resource only
> > I noticed that under certain settings (powering on/off nodes enough times) 
> > the secondary node may never becomes promoted when primary is shutdown. 
> 
> I *think* that is intentional, and preventing potential data divergence,
> in the following scenario:
> 
>  * all good, Primary --- connected --- Secondary
>  * Kill Secondary, Primary continues.
>  * Powerdown Primary.
>  * Bring up Secondary only.
> 
> What use is fencing, if a fencing loop would cause data divergence anyways.
> 
> > Here is a sample log (attached)
> >  
> > Jan 18 08:34:52 NODE-1 crmd: [2054]: info: do_lrm_rsc_op: Performing 
> > key=7:89911:0:aac20e27-939f-439c-b461-e668262718b3 
> > op=drbd_fsroot:0_promote_0 )
> > Jan 18 08:34:52 NODE-1 lrmd: [2051]: info: rsc:drbd_fsroot:0:299768: promote
> > Jan 18 08:34:52 NODE-1 kernel: block drbd0: helper command: /sbin/drbdadm 
> > fence-peer minor-0
> > Jan 18 08:34:52 NODE-1 corosync[1759]:   [TOTEM ] Automatically recovered 
> > ring 1
> > Jan 18 08:34:53 NODE-1 crm-fence-peer.sh[24325]: invoked for fsroot
> > Jan 18 08:34:53 NODE-1 corosync[1759]:   [TOTEM ] Automatically recovered 
> > ring 1
> 
> > Jan 18 08:34:53 NODE-1 crm-fence-peer.sh[24325]: WARNING peer is 
> > unreachable, my disk is Consistent: did not place the constraint!
> 
> This is it.
> 
> -- 
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> _______________________________________________
> drbd-user mailing list
> [email protected]
> http://lists.linbit.com/mailman/listinfo/drbd-user
                                          
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to