On Oct 8, 2012, at 9:19 AM, Velayutham, Prakash wrote:

> On Oct 8, 2012, at 4:55 AM, Lars Ellenberg wrote:
> 
>> On Sat, Oct 06, 2012 at 01:08:43PM +0000, Velayutham, Prakash wrote:
>>> Hi,
>>> 
>>> I recently got a DRBD (8.4.2-2) cluster up (still testing). It seems to 
>>> work nicely with Pacemaker CRM in several scenarios I have tested. Here is 
>>> my config.
>>> 
>>> global {
>>>               usage-count     yes;
>>> }
>>> 
>>> common {
>>>       handlers {
>>>               outdate-peer    /usr/lib/drbd/crm-fence-peer.sh;
>>>               fence-peer      /usr/lib/drbd/crm-fence-peer.sh;
>>>               after-resync-target     /usr/lib/drbd/crm-unfence-peer.sh;
>>>               local-io-error "/usr/lib/drbd/notify-io-error.sh; 
>>> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; 
>>> halt -f";
>>>               split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>>>       }
>>> 
>>>       startup {
>>>               degr-wfc-timeout        0;
>>>       }
>>> 
>>>       net {
>>>               shared-secret   1QP69G4kWDslx2TMiaEStI6bwaGH5y8d;
>>>               after-sb-0pri discard-zero-changes;
>>>               after-sb-1pri discard-secondary;
>>>               after-sb-2pri disconnect;
>>>       }
>>> 
>>>       disk {
>>>               on-io-error     call-local-io-error;
>>>               fencing resource-and-stonith;
>>>       }
>>> 
>>> }
>>> 
>>> The io-error handler only gets called when the primary node has a disk
>>> issue. I have not seen the secondary node call the "local-io-error"
>>> handler when it had disk access issues. Is this by design?
>> 
>> No.
>> 
>> "Works for me", though.
>> 
>> Can you please double check?
>> And if in fact you can reproduce, tell us how, including logs?
>> 
>> 
>> Thanks,
>> 
>> -- 
>> : Lars Ellenberg
> 
> Hi Lars,
> 
> If I disable all the FC ports in the fiber switch just for the primary node, 
> the node fences, reboots and comes up, as I would expect. With the exact same 
> config, if I disable the FC ports just for the secondary node, the node just 
> sits there and it even shows up as Secondary in /proc/drbd. That sounds odd 
> and sounds like the config should be "diskless", but it is 
> "call-local-io-error".
> 
> Here is the full config.
> 
> /etc/drbd.conf
> 
> ## generated by drbd-gui
> 
> include "drbd.d/global_common.conf";
> include "drbd.d/*.res";
> 
> /etc/drbd.d/global_common.conf:
> 
> ## generated by drbd-gui
> 
> global {
>               usage-count     yes;
> }
> 
> common {
>       handlers {
>               fence-peer      /usr/lib/drbd/crm-fence-peer.sh;
>               after-resync-target     /usr/lib/drbd/crm-unfence-peer.sh;
>               local-io-error "/usr/lib/drbd/notify-io-error.sh; 
> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; 
> halt -f";
>               split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>       }
> 
>       startup {
>               degr-wfc-timeout        0;
>       }
> 
>       net {
>               shared-secret   1QP69G4kWDslx2TMiaEStI6bwaGH5y8d;
>               after-sb-0pri discard-zero-changes;
>               after-sb-1pri discard-secondary;
>               after-sb-2pri disconnect;
>       }
> 
>       disk {
>               on-io-error     call-local-io-error;
>               fencing resource-and-stonith;
>       }
> 
> }
> 
> /etc/drbd.d/mysql1.res:
> 
> resource mysql1 {
>       net {
>               cram-hmac-alg   sha1;
>       }
> 
>       on bmimysqlt3.x.x.x {
>               volume 0 {
>                       device          /dev/drbd0;
>                       disk            /dev/mapper/mysql_data1;
>                       flexible-meta-disk      internal;
>               }
>               address         x.x.x.x:7788;
>       }
>       on bmimysqlt4.x.x.x {
>               volume 0 {
>                       device          /dev/drbd0;
>                       disk            /dev/mapper/mysql_data1;
>                       flexible-meta-disk      internal;
>               }
>               address         x.x.x.x:7788;
>       }
> }
> 
> Which logs are you wanting me to share?
> 
> Thanks,
> Prakash

Just wanted to add this. I repeated my test again and get the exact same 
results again. Here is /proc/drbd of the primary (bmimysqlt3) and secondary 
(bmimysqlt4) before the secondary's disk is cut off (disabling the fiber switch 
port that the secondary is connected to)

[root@bmimysqlt3 ~]# cat /proc/drbd 
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by 
[email protected], 2012-10-02 00:02:32
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:184 nr:0 dw:160 dr:14317 al:6 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[root@bmimysqlt4 ~]# cat /proc/drbd 
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by 
[email protected], 2012-10-02 00:02:32
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:184 dw:184 dr:0 al:0 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Here is /proc/drbd of primary and secondary about 5 minutes after the disk is 
cut off.

[root@bmimysqlt3 ~]# cat /proc/drbd 
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by 
[email protected], 2012-10-02 00:02:32
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:184 nr:0 dw:160 dr:14317 al:6 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[root@bmimysqlt4 ~]# cat /proc/drbd 
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by 
[email protected], 2012-10-02 00:02:32
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:184 dw:184 dr:0 al:0 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

As you can see, there is absolutely nothing there to suggest that the secondary 
even noticed the io-error.

I can't understand what is going on.

Thanks,
Prakash

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to