Hello, Phil! Never tried ConnectX-2 and "repository" software versions but my setup feels pretty good with Mellanox OFED <http://ru.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers>. AFAIK the latest OFED version (4.x) has dropped Connect-X2 support but you can try 3.4 version. Actually according to my notes all went pretty well without any issues.
Any dmesg or syslog messages/issues? Distribution\kernel versions? What have I done (on Ubuntu 16.04): Installed Mellanox OFED (it has an automated installed, just run it from ISO; if you have the most recent Linux distribution you'll probably need to turn off version check with an appropriate installer option). Put IPoIB into connected mode (it's in the datagramm mode by default) [i believe this might be the case]: sudo sed -i -e 's/SET_IPOIB_CM=auto/SET_IPOIB_CM=yes/g' > /etc/infiniband/openib.conf Configured opensm, I have a number of partitions to isolate different proposed traffic: cat << 'EOF' | sudo tee /etc/opensm/partitions.conf > # For reference: > # IPv4 IANA reserved multicast addresses: > # > http://www.iana.org/assignments/multicast-addresses/multicast-addresses.txt > # IPv6 IANA reserved multicast addresses: > # > http://www.iana.org/assignments/ipv6-multicast-addresses/ipv6-multicast-addresses.xml > # > # mtu = > # 1 = 256 > # 2 = 512 > # 3 = 1024 > # 4 = 2048 > # 5 = 4096 > # > # rate = > # 2 = 2.5 GBit/s > # 3 = 10 GBit/s > # 4 = 30 GBit/s > # 5 = 5 GBit/s > # 6 = 20 GBit/s > # 7 = 40 GBit/s > # 8 = 60 GBit/s > # 9 = 80 GBit/s > # 10 = 120 GBit/s > Default=0x7fff, rate=7, mtu=4, scope=2, defmember=full: > ALL, ALL_SWITCHES=full; > Default=0x7fff, ipoib, rate=7, mtu=4, scope=2: > mgid=ff12:401b::ffff:ffff # IPv4 Broadcast address > mgid=ff12:401b::1 # IPv4 All Hosts group > mgid=ff12:401b::2 # IPv4 All Routers group > mgid=ff12:401b::16 # IPv4 IGMP group > mgid=ff12:401b::fb # IPv4 mDNS group > mgid=ff12:401b::fc # IPv4 Multicast Link Local Name > Resolution group > mgid=ff12:401b::101 # IPv4 NTP group > mgid=ff12:401b::202 # IPv4 Sun RPC > mgid=ff12:601b::1 # IPv6 All Hosts group > mgid=ff12:601b::2 # IPv6 All Routers group > mgid=ff12:601b::16 # IPv6 MLDv2-capable Routers group > mgid=ff12:601b::fb # IPv6 mDNS group > mgid=ff12:601b::101 # IPv6 NTP group > mgid=ff12:601b::202 # IPv6 Sun RPC group > mgid=ff12:601b::1:3 # IPv6 Multicast Link Local Name > Resolution group > ALL=full, ALL_SWITCHES=full; > Public=0x0003, rate=7, mtu=4, scope=2, defmember=full: > ALL, ALL_SWITCHES=full; > Public=0x0003, ipoib, rate=7, mtu=4, scope=2: > mgid=ff12:401b::ffff:ffff # IPv4 Broadcast address > mgid=ff12:401b::1 # IPv4 All Hosts group > mgid=ff12:401b::2 # IPv4 All Routers group > mgid=ff12:401b::16 # IPv4 IGMP group > mgid=ff12:401b::fb # IPv4 mDNS group > mgid=ff12:401b::fc # IPv4 Multicast Link Local Name > Resolution group > mgid=ff12:401b::101 # IPv4 NTP group > mgid=ff12:401b::202 # IPv4 Sun RPC > mgid=ff12:601b::1 # IPv6 All Hosts group > mgid=ff12:601b::2 # IPv6 All Routers group > mgid=ff12:601b::16 # IPv6 MLDv2-capable Routers group > mgid=ff12:601b::fb # IPv6 mDNS group > mgid=ff12:601b::101 # IPv6 NTP group > mgid=ff12:601b::202 # IPv6 Sun RPC group > mgid=ff12:601b::1:3 # IPv6 Multicast Link Local Name > Resolution group > ALL=full, ALL_SWITCHES=full; > Storage=0x0004, rate=7, mtu=4, scope=2, defmember=full: > ALL, ALL_SWITCHES=full; > Storage=0x0004, ipoib, rate=7, mtu=4, scope=2: > mgid=ff12:401b::ffff:ffff # IPv4 Broadcast address > mgid=ff12:401b::1 # IPv4 All Hosts group > mgid=ff12:401b::2 # IPv4 All Routers group > mgid=ff12:401b::16 # IPv4 IGMP group > mgid=ff12:401b::fb # IPv4 mDNS group > mgid=ff12:401b::fc # IPv4 Multicast Link Local Name > Resolution group > mgid=ff12:401b::101 # IPv4 NTP group > mgid=ff12:401b::202 # IPv4 Sun RPC > mgid=ff12:601b::1 # IPv6 All Hosts group > mgid=ff12:601b::2 # IPv6 All Routers group > mgid=ff12:601b::16 # IPv6 MLDv2-capable Routers group > mgid=ff12:601b::fb # IPv6 mDNS group > mgid=ff12:601b::101 # IPv6 NTP group > mgid=ff12:601b::202 # IPv6 Sun RPC group > mgid=ff12:601b::1:3 # IPv6 Multicast Link Local Name > Resolution group > ALL=full, ALL_SWITCHES=full; > Storage=0x0005, rate=7, mtu=4, scope=2, defmember=full: > ALL, ALL_SWITCHES=full; > Storage=0x0005, ipoib, rate=7, mtu=4, scope=2: > mgid=ff12:401b::ffff:ffff # IPv4 Broadcast address > mgid=ff12:401b::1 # IPv4 All Hosts group > mgid=ff12:401b::2 # IPv4 All Routers group > mgid=ff12:401b::16 # IPv4 IGMP group > mgid=ff12:401b::fb # IPv4 mDNS group > mgid=ff12:401b::fc # IPv4 Multicast Link Local Name > Resolution group > mgid=ff12:401b::101 # IPv4 NTP group > mgid=ff12:401b::202 # IPv4 Sun RPC > mgid=ff12:601b::1 # IPv6 All Hosts group > mgid=ff12:601b::2 # IPv6 All Routers group > mgid=ff12:601b::16 # IPv6 MLDv2-capable Routers group > mgid=ff12:601b::fb # IPv6 mDNS group > mgid=ff12:601b::101 # IPv6 NTP group > mgid=ff12:601b::202 # IPv6 Sun RPC group > mgid=ff12:601b::1:3 # IPv6 Multicast Link Local Name > Resolution group > ALL=full, ALL_SWITCHES=full; > EOF I believe in your case you need just the first block (default partition, with key: 0x7fff). Also check rate id, I have QDR IB, so it's 7 (40Gbit\s) Enabled OpenSM (but you've already done if you are able to ibping nodes by GUIDs). after that set IP addresses, in my case it's done like this (for every partition\VLAN): > cat << 'EOF' | sudo tee /etc/network/interfaces.d/ib0.8003 > auto ib0.8003 > iface ib0.8003 inet static > address 10.103.0.XXX > netmask 255.255.0.0 > post-up ifconfig $IFACE mtu 65520 > EOF reboot the host and after that: > admin@e001n01:~$ ping -c 5 10.101.0.2 > PING 10.101.0.2 (10.101.0.2) 56(84) bytes of data. > 64 bytes from 10.101.0.2: icmp_seq=1 ttl=64 time=0.138 ms > 64 bytes from 10.101.0.2: icmp_seq=2 ttl=64 time=0.156 ms > 64 bytes from 10.101.0.2: icmp_seq=3 ttl=64 time=0.139 ms > 64 bytes from 10.101.0.2: icmp_seq=4 ttl=64 time=0.146 ms > 64 bytes from 10.101.0.2: icmp_seq=5 ttl=64 time=0.140 ms > --- 10.101.0.2 ping statistics --- > 5 packets transmitted, 5 received, 0% packet loss, time 4072ms > rtt min/avg/max/mdev = 0.138/0.143/0.156/0.016 ms 2017-12-19 12:35 GMT+05:00 Phil Schwarz <[email protected]>: > Hi, > I'm currently trying to set up a brand new home cluster : > - 5 nodes, with each : > > - 1 HCA Mellanox ConnectX-2 > - 1 GB Ethernet (Proxmox 5.1 Network Admin) > - 1 CX4 to CX4 cable > > All together connected to a SDR Flextronics IB Switch. > > This setup should back a Ceph Luminous (V12.2.2 included in proxmox > V5.1) On all nodes, I did: > - apt-get infiniband-diags > - modprobe mlx4_ib > - modprobe ib_ipoib > - modprobe ib_umad > - ifconfig ib0 IP/MASK > > On two nodes (tried previously on a single on, same issue), i installed > opensm ( The switch doesn't have SM included) : > apt-get install opensm > /etc/init.d/opensm stop > /etc/init.d/opensm start > (Necessary to let the daemon create the logfiles) > > I tailed the logfile and got a "Active&Running" Setup, with "SUBNET UP" > > Every node is OK regardless to IB Setup : > - All ib0 are UP, using ibstat > - ibhosts and ibswitches seem to be OK > > On a node : > ibping -S > > On every other node : > ibping -G GID_Of_Previous_Server_Port > > I got a nice pong reply on every node. Should be happy, but... > But i never went further.. Tried to ping each other. No way to get into > this (mostly probably) simple issue... > > > Any hint to achieve this task ?? > > > Thanks for all > Best regards > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- С уважением, Дробышевский Владимир Компания "АйТи Город" +7 343 2222192 ИТ-консалтинг Поставка проектов "под ключ" Аутсорсинг ИТ-услуг Аутсорсинг ИТ-инфраструктуры
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
