Hello,

We had drbd 8.3 replicating using mellanox infiniband cards using SDP. It 
worked fine.
After upgrading to 8.4 the SDP replication doesn't work. Using plain IP over 
Infiniband works but using sdp i get the following logs in the "secondary host":

Oct 17 11:36:27 s2 -bash: (4415) [root.root] |.| /etc/init.d/drbd start
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up               
                                                                                
        
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up               
                                                                                
        
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up               
                                                                                
        
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: applying 16k kernel stack fix up
Oct 17 11:36:27 s2 kernel: drbd: events: mcg drbd: 2
Oct 17 11:36:27 s2 kernel: drbd: initialized. Version: 8.4.6 
(api:1/proto:86-101)
Oct 17 11:36:27 s2 kernel: drbd: GIT-hash: 
833d830e0152d1e457fa7856e71e11248ccf3f70 build by phil@Build64R6, 2015-04-09 
14:35:00
Oct 17 11:36:27 s2 kernel: drbd: registered as block device major 147
Oct 17 11:36:27 s2 kernel: drbd infiniband: Starting worker thread (from 
drbdsetup-84 [21673])
Oct 17 11:36:27 s2 kernel: block drbd0: disk( Diskless -> Attaching ) 
Oct 17 11:36:27 s2 kernel: drbd infiniband: Method to ensure write ordering: 
drain
Oct 17 11:36:27 s2 kernel: block drbd0: max BIO size = 1048576
Oct 17 11:36:27 s2 kernel: block drbd0: drbd_bm_resize called with capacity == 
2929267928
Oct 17 11:36:27 s2 multipathd: drbd0: add path (uevent)
Oct 17 11:36:27 s2 multipathd: drbd0: failed to get path uid
Oct 17 11:36:27 s2 multipathd: uevent trigger error
Oct 17 11:36:27 s2 kernel: block drbd0: resync bitmap: bits=366158491 
words=5721227 pages=11175
Oct 17 11:36:27 s2 kernel: block drbd0: size = 1397 GB (1464633964 KB)
Oct 17 11:36:28 s2 kernel: block drbd0: recounting of set bits took additional 
41 jiffies
Oct 17 11:36:28 s2 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on 
disk bit-map.
Oct 17 11:36:28 s2 kernel: block drbd0: disk( Attaching -> UpToDate ) 
Oct 17 11:36:28 s2 kernel: block drbd0: attached to UUIDs 
8C14BF163C91396E:9322D5CEC266CFFB:02BBF663E04A58FD:02BAF663E04A58FC
Oct 17 11:36:28 s2 kernel: drbd infiniband: conn( StandAlone -> Unconnected ) 
Oct 17 11:36:28 s2 kernel: drbd infiniband: Starting receiver thread (from 
drbd_w_infiniba [21676])
Oct 17 11:36:28 s2 kernel: drbd infiniband: receiver (re)started
Oct 17 11:36:28 s2 kernel: drbd infiniband: conn( Unconnected -> WFConnection ) 
Oct 17 11:36:39 s2 kernel: drbd infiniband: sock_recvmsg returned -11
Oct 17 11:36:39 s2 kernel: drbd infiniband: conn( WFConnection -> BrokenPipe ) 
Oct 17 11:36:39 s2 kernel: drbd infiniband: short read (expected size 8)
Oct 17 11:36:39 s2 kernel: drbd infiniband: Connection closed
Oct 17 11:36:39 s2 kernel: drbd infiniband: conn( BrokenPipe -> Unconnected ) 
Oct 17 11:36:40 s2 kernel: drbd infiniband: conn( Unconnected -> WFConnection ) 

It remains here....

CTRL+C

Oct 17 11:36:54 s2 -bash: (4415) [root.root] |.| /etc/init.d/drbd stop
Oct 17 11:36:54 s2 kernel: drbd infiniband: conn( WFConnection -> Disconnecting 
) 
Oct 17 11:36:54 s2 kernel: drbd infiniband: Discarding network configuration.
Oct 17 11:36:54 s2 kernel: drbd infiniband: Connection closed
Oct 17 11:36:54 s2 kernel: drbd infiniband: conn( Disconnecting -> StandAlone ) 
Oct 17 11:36:54 s2 kernel: drbd infiniband: receiver terminated
Oct 17 11:36:54 s2 kernel: drbd infiniband: Terminating drbd_r_infiniba
Oct 17 11:36:54 s2 kernel: block drbd0: disk( UpToDate -> Failed ) 
Oct 17 11:36:54 s2 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies
Oct 17 11:36:54 s2 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on 
disk bit-map.
Oct 17 11:36:54 s2 kernel: block drbd0: disk( Failed -> Diskless ) 
Oct 17 11:36:54 s2 multipathd: drbd0: remove path (uevent)
Oct 17 11:36:54 s2 kernel: drbd infiniband: Terminating drbd_w_infiniba
Oct 17 11:36:54 s2 kernel: drbd: module cleanup done.


I'm using latest centos 6 with 2.6.32-573.7.1.el6.x86_64 kernel and drbd84 from 
elrepo.
Here is the conf of the resourse:

resource infiniband {
  device /dev/drbd0;
  meta-disk internal;
  handlers {
    split-brain "/usr/lib/drbd/notify-split-brain.sh root";
  }
  startup {
    become-primary-on both;
  }
  net {
    allow-two-primaries;
    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
  }
  syncer {
    rate 1500M;
  }
  disk {
    # raid controller with battery back
    disk-flushes no;
  }
  on s1 {
    disk /dev/sdb;
    #address 192.168.11.1:7789;
    address sdp 192.168.11.1:7789;
  }
  on s2 {
    disk /dev/sdb;
    address sdp 192.168.11.2:7789;
  }
}

Aditional info:
[root@s1 drbd.d]# lspci|grep -i mell
03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
 modinfo ib_sdp
filename:       
/lib/modules/2.6.32-573.7.1.el6.x86_64/weak-updates/mlnx-ofa_kernel/drivers/infiniband/ulp/sdp/ib_sdp.ko
license:        Dual BSD/GPL
description:    InfiniBand SDP module
author:         Michael S. Tsirkin
srcversion:     D046FDB330053923ED58690

Thanks for any help..
Best regards,
Nuno Fernandes
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to