Yes I've seen the posts you suggested. 
Do you have it tested? 
we are using qib driver for qlogic 7342 adapters, don't know if someone ported 
to it too.

Regards,

--
matteo

----- Messaggio originale -----
> Da: "Eric Blevins" <[email protected]>
> A: "Matteo Tescione" <[email protected]>
> Cc: [email protected]
> Inviato: Venerdì, 16 gennaio 2015 15:50:17
> Oggetto: Re: [DRBD-user] Possible IPoIB deadlock with DRBD
> 
> The split brain would only happen on dual primary.
> 
> We have Mellanox MHEA28-XTC using mthca driver.
> 
> The potential IPoIB deadlock is only fixed in the mlx4 driver so far.
> 
> 
> 
> common {
>   net {
>     connect-int 20; #Default 10 units 1
>     timeout 180; #default 60 units .1
>     ping-int 30; #default 10 units 1
>     ping-timeout 10; #default 5 units .1
>     ko-count 20;
>     max-buffers 16000;
>     max-epoch-size 16000;
>     sndbuf-size 0;
>     rcvbuf-size 0;
>     unplug-watermark 16001;
>     verify-alg md5;
>   }
>   disk {
>     c-plan-ahead 10;
>     c-min-rate 30M;
>     c-max-rate 200M;
>     c-fill-target 20M;
>     al-extents 3389;
>     md-flushes no;
>     disk-barrier no;
>     disk-flushes no;
>   }
> }
> resource drbd0 {
>   device /dev/drbd0;
>   disk /dev/sdc;
>   meta-disk internal;
>   startup {
>     wfc-timeout  120;
>     degr-wfc-timeout 60;
>     outdated-wfc-timeout 60;
>     become-primary-on both;
>   }
>   disk {
>     c-max-rate 200M;
>     c-min-rate 30M;
>     c-fill-target 20M;
>     c-plan-ahead 10;
> 
>   }
>   net {
>     protocol C;
>     cram-hmac-alg sha1;
>     shared-secret "XXXXXXXXXXXXX";
>     allow-two-primaries;
>     after-sb-0pri discard-zero-changes;
>     after-sb-1pri discard-secondary;
>     after-sb-2pri disconnect;
>   }
>   on vm1 {
>     address x.x.x.1:7788;
>   }
>   on vm2 {
>     address x.x.x.2:7788;
>   }
> }
> 
> On Fri, Jan 16, 2015 at 5:32 AM, Matteo Tescione <[email protected]>
> wrote:
> > Hi Eric,
> >
> > it seems that I'm hitting the same deadlock, but I don't use dual
> > primary, and the split brain never occurs.
> >
> > Can you post your drbd config as long with the infiniband hba model
> > and version you're using?
> >
> > regards,
> >
> > --
> > matteo
> >
> > ----- Messaggio originale -----
> >> Da: "Eric Blevins" <[email protected]>
> >> A: [email protected]
> >> Inviato: Giovedì, 15 gennaio 2015 17:53:48
> >> Oggetto: [DRBD-user] Possible IPoIB deadlock with DRBD
> >>
> >> We are using Proxmox with DRBD in dual primary using IPoIB for
> >> transport
> >> Recently tested Proxmox upcoming 3.10 kernel based on the kernel
> >> from
> >> RHEL 7 and started having problems with DRBD.
> >>
> >> The kernel came with DRBD 8.4.3, I have also compiled and
> >> installed
> >> 8.4.5 and both experience the same problem.
> >>
> >> During times of heavy IO loads (backups) DRBD will timeout and
> >> split
> >> brain, I have included some logs below.
> >> I stumbled on a couple LKML threads that discusses a deadlock with
> >> IPoIB and IO that happens over the IPoIB such as iSCSI or NFS.
> >> https://lkml.org/lkml/2014/2/21/655
> >> http://lkml.org/lkml/2014/4/24/543
> >>
> >> Is it likely that DRBD could also trigger the deadlock discussed
> >> on
> >> LKML?
> >> If not, do you have any other suggestions on how I can prevent
> >> this
> >> timeout?
> >>
> >>
> >> Node A:
> >> Jan  5 03:23:51 vm6 kernel: [2221944.335766] drbd drbd0: peer(
> >> Primary
> >> -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate ->
> >> DUnknown
> >> )
> >> Jan  5 03:23:51 vm6 kernel: [2221944.335782] drbd drbd0: asender
> >> terminated
> >> Jan  5 03:23:51 vm6 kernel: [2221944.335784] drbd drbd0:
> >> Terminating
> >> drbd_a_drbd0
> >> Jan  5 03:23:51 vm6 kernel: [2221944.335846] block drbd0: new
> >> current
> >> UUID
> >> BD9DB97EC672F5C9:8F2DD469C771058B:925C07CF6316212D:925B07CF6316212D
> >> Jan  5 03:23:51 vm6 kernel: [2221944.347788] drbd drbd0:
> >> Connection
> >> closed
> >> Jan  5 03:23:51 vm6 kernel: [2221944.347834] drbd drbd0: conn(
> >> Timeout
> >> -> Unconnected )
> >> Jan  5 03:23:51 vm6 kernel: [2221944.347836] drbd drbd0: receiver
> >> terminated
> >>
> >>
> >> Node B:
> >> Jan  5 03:23:51 vm5 kernel: [2223090.170391] drbd drbd0: sock was
> >> shut
> >> down by peer
> >> Jan  5 03:23:51 vm5 kernel: [2223090.170409] drbd drbd0: peer(
> >> Primary
> >> -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate ->
> >> DUnknown )
> >> Jan  5 03:23:51 vm5 kernel: [2223090.170412] drbd drbd0: short
> >> read
> >> (expected size 16)
> >> Jan  5 03:23:51 vm5 kernel: [2223090.170421] drbd drbd0: asender
> >> terminated
> >> Jan  5 03:23:51 vm5 kernel: [2223090.170423] drbd drbd0:
> >> Terminating
> >> drbd_a_drbd0
> >> Jan  5 03:23:51 vm5 kernel: [2223090.170480] block drbd0: new
> >> current
> >> UUID
> >> 2628F73F9DAE5EDF:8F2DD469C771058B:925C07CF6316212D:925B07CF6316212D
> >> Jan  5 03:23:51 vm5 kernel: [2223090.185536] drbd drbd0:
> >> Connection
> >> closed
> >> Jan  5 03:23:51 vm5 kernel: [2223090.185585] drbd drbd0: conn(
> >> BrokenPipe -> Unconnected )
> >> Jan  5 03:23:51 vm5 kernel: [2223090.185587] drbd drbd0: receiver
> >> terminated
> >>
> >> Eric
> >> _______________________________________________
> >> drbd-user mailing list
> >> [email protected]
> >> http://lists.linbit.com/mailman/listinfo/drbd-user
> >>
> >>
> >> --
> >> This message has been scanned for viruses and dangerous content by
> >> RMnet MailScanner, and is believed to be clean.
> >>
> >> Click here to report this message as spam.
> >> http://efa1.rmnet.it/cgi-bin/learn-msg.cgi?id=4C1D868B16.A88D5&token=94b3a0f1dfd9db46184ad15228603c27
> >>
> >>
> 
> 
> --
> This message has been scanned for viruses and dangerous content by
> E.F.A. Project, and is believed to be clean.
> 
> Click here to report this message as spam.
> http://efa2.rmnet.it/cgi-bin/learn-msg.cgi?id=56D2360055.A4FCF&token=6f1b16f22f5f99bcc8213a40ef7ce29d
> 
> 
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to