Yes I've seen the posts you suggested. Do you have it tested? we are using qib driver for qlogic 7342 adapters, don't know if someone ported to it too.
Regards, -- matteo ----- Messaggio originale ----- > Da: "Eric Blevins" <[email protected]> > A: "Matteo Tescione" <[email protected]> > Cc: [email protected] > Inviato: Venerdì, 16 gennaio 2015 15:50:17 > Oggetto: Re: [DRBD-user] Possible IPoIB deadlock with DRBD > > The split brain would only happen on dual primary. > > We have Mellanox MHEA28-XTC using mthca driver. > > The potential IPoIB deadlock is only fixed in the mlx4 driver so far. > > > > common { > net { > connect-int 20; #Default 10 units 1 > timeout 180; #default 60 units .1 > ping-int 30; #default 10 units 1 > ping-timeout 10; #default 5 units .1 > ko-count 20; > max-buffers 16000; > max-epoch-size 16000; > sndbuf-size 0; > rcvbuf-size 0; > unplug-watermark 16001; > verify-alg md5; > } > disk { > c-plan-ahead 10; > c-min-rate 30M; > c-max-rate 200M; > c-fill-target 20M; > al-extents 3389; > md-flushes no; > disk-barrier no; > disk-flushes no; > } > } > resource drbd0 { > device /dev/drbd0; > disk /dev/sdc; > meta-disk internal; > startup { > wfc-timeout 120; > degr-wfc-timeout 60; > outdated-wfc-timeout 60; > become-primary-on both; > } > disk { > c-max-rate 200M; > c-min-rate 30M; > c-fill-target 20M; > c-plan-ahead 10; > > } > net { > protocol C; > cram-hmac-alg sha1; > shared-secret "XXXXXXXXXXXXX"; > allow-two-primaries; > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > } > on vm1 { > address x.x.x.1:7788; > } > on vm2 { > address x.x.x.2:7788; > } > } > > On Fri, Jan 16, 2015 at 5:32 AM, Matteo Tescione <[email protected]> > wrote: > > Hi Eric, > > > > it seems that I'm hitting the same deadlock, but I don't use dual > > primary, and the split brain never occurs. > > > > Can you post your drbd config as long with the infiniband hba model > > and version you're using? > > > > regards, > > > > -- > > matteo > > > > ----- Messaggio originale ----- > >> Da: "Eric Blevins" <[email protected]> > >> A: [email protected] > >> Inviato: Giovedì, 15 gennaio 2015 17:53:48 > >> Oggetto: [DRBD-user] Possible IPoIB deadlock with DRBD > >> > >> We are using Proxmox with DRBD in dual primary using IPoIB for > >> transport > >> Recently tested Proxmox upcoming 3.10 kernel based on the kernel > >> from > >> RHEL 7 and started having problems with DRBD. > >> > >> The kernel came with DRBD 8.4.3, I have also compiled and > >> installed > >> 8.4.5 and both experience the same problem. > >> > >> During times of heavy IO loads (backups) DRBD will timeout and > >> split > >> brain, I have included some logs below. > >> I stumbled on a couple LKML threads that discusses a deadlock with > >> IPoIB and IO that happens over the IPoIB such as iSCSI or NFS. > >> https://lkml.org/lkml/2014/2/21/655 > >> http://lkml.org/lkml/2014/4/24/543 > >> > >> Is it likely that DRBD could also trigger the deadlock discussed > >> on > >> LKML? > >> If not, do you have any other suggestions on how I can prevent > >> this > >> timeout? > >> > >> > >> Node A: > >> Jan 5 03:23:51 vm6 kernel: [2221944.335766] drbd drbd0: peer( > >> Primary > >> -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> > >> DUnknown > >> ) > >> Jan 5 03:23:51 vm6 kernel: [2221944.335782] drbd drbd0: asender > >> terminated > >> Jan 5 03:23:51 vm6 kernel: [2221944.335784] drbd drbd0: > >> Terminating > >> drbd_a_drbd0 > >> Jan 5 03:23:51 vm6 kernel: [2221944.335846] block drbd0: new > >> current > >> UUID > >> BD9DB97EC672F5C9:8F2DD469C771058B:925C07CF6316212D:925B07CF6316212D > >> Jan 5 03:23:51 vm6 kernel: [2221944.347788] drbd drbd0: > >> Connection > >> closed > >> Jan 5 03:23:51 vm6 kernel: [2221944.347834] drbd drbd0: conn( > >> Timeout > >> -> Unconnected ) > >> Jan 5 03:23:51 vm6 kernel: [2221944.347836] drbd drbd0: receiver > >> terminated > >> > >> > >> Node B: > >> Jan 5 03:23:51 vm5 kernel: [2223090.170391] drbd drbd0: sock was > >> shut > >> down by peer > >> Jan 5 03:23:51 vm5 kernel: [2223090.170409] drbd drbd0: peer( > >> Primary > >> -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> > >> DUnknown ) > >> Jan 5 03:23:51 vm5 kernel: [2223090.170412] drbd drbd0: short > >> read > >> (expected size 16) > >> Jan 5 03:23:51 vm5 kernel: [2223090.170421] drbd drbd0: asender > >> terminated > >> Jan 5 03:23:51 vm5 kernel: [2223090.170423] drbd drbd0: > >> Terminating > >> drbd_a_drbd0 > >> Jan 5 03:23:51 vm5 kernel: [2223090.170480] block drbd0: new > >> current > >> UUID > >> 2628F73F9DAE5EDF:8F2DD469C771058B:925C07CF6316212D:925B07CF6316212D > >> Jan 5 03:23:51 vm5 kernel: [2223090.185536] drbd drbd0: > >> Connection > >> closed > >> Jan 5 03:23:51 vm5 kernel: [2223090.185585] drbd drbd0: conn( > >> BrokenPipe -> Unconnected ) > >> Jan 5 03:23:51 vm5 kernel: [2223090.185587] drbd drbd0: receiver > >> terminated > >> > >> Eric > >> _______________________________________________ > >> drbd-user mailing list > >> [email protected] > >> http://lists.linbit.com/mailman/listinfo/drbd-user > >> > >> > >> -- > >> This message has been scanned for viruses and dangerous content by > >> RMnet MailScanner, and is believed to be clean. > >> > >> Click here to report this message as spam. > >> http://efa1.rmnet.it/cgi-bin/learn-msg.cgi?id=4C1D868B16.A88D5&token=94b3a0f1dfd9db46184ad15228603c27 > >> > >> > > > -- > This message has been scanned for viruses and dangerous content by > E.F.A. Project, and is believed to be clean. > > Click here to report this message as spam. > http://efa2.rmnet.it/cgi-bin/learn-msg.cgi?id=56D2360055.A4FCF&token=6f1b16f22f5f99bcc8213a40ef7ce29d > > _______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
