>> I think here it should be a colon between the two MGS nids: >> mount -v -t lustre 172.16.10.1@o2ib0:192.168.10.1@tcp0:/testfs
That's part of my problem. The Lustre 2.x manual says that comma-delimited NIDs are on the same host, but colon-delimited NIDs are on separate hosts. Is that just for lustre.conf & mkfs.lustre, or is it for mount operations as well? In this case, my MGS node has a TCP and an IB rail to accommodate the different clients, so I'd use a comma, right? On Mon, Sep 28, 2015 at 7:07 AM, Martin Hecht <[email protected]> wrote: > On 09/27/2015 08:59 PM, Exec Unerd wrote: > >> I'm not sure if I have understood your setup correctly. > > In this case, the clients are a combination of all three: some are o2ib > > only, some tcp only, and some o2ib+tcp with tcp as failover. > > > > It sounds like I need a combination of configurations, one for the OSSes > > and one for each client type. > > > > So if I used this parameter in the OST, > > --mgsnode="172.16.10.1@o2ib0,192.168.10.1@tcp0" > > > > Then configured the modprobe.d/lustre.conf appropriately on the clients > > tcp: options lnet networks="tcp0(ixgbe1)" > > o2ib: options lnet networks="o2ib0(ib1)" > > both: options lnet networks="o2ib0(ib1),tcp0(ixgbe1)" > > > > And use these mount parameters: > > tcp: mount -v -t lustre 192.168.10.1@tcp0:/testfs /mnt/testfs > > o2ib: mount -v -t lustre 172.16.10.1@o2ib0:/testfs /mnt/testfs > > both: mount -v -t lustre 172.16.10.1@o2ib0,192.168.10.1@tcp0:/testfs > I think here it should be a colon between the two MGS nids: > > mount -v -t lustre 172.16.10.1@o2ib0:192.168.10.1@tcp0:/testfs > > > > /mnt/testfs > > > > Everything should be happy? > > > > On Thu, Sep 24, 2015 at 9:12 AM, Martin Hecht <[email protected]> wrote: > > > >> On 09/24/2015 05:33 PM, Chris Hunter wrote: > >>> [...] > >>>> 2. What's the best way to trace the TCP client interactions to see > >>>> where > >>>> it's breaking down? > >>> If lnet is running on the client, you can try "lctl ping" > >>> eg) lctl ping 172.16.10.1@o2ib > >>> > >>> I believe a lustre mount uses ipoib for initial handshake with a mds > >>> o2ib interfaces. You should make sure regular ping over ipoib is > >>> working before mounting lustre. > >> if the client and the server is on the same network, yes, it's a good > >> starting point. But it's not a prerequisite. In general you can have an > >> lnet router in-between or have different ip subnets for ipoib, so you > >> can't ping on the ipoib layer, but you can still lctl ping the whole > >> path (although you could verify that you can ip ping to the next hop at > >> least). > >> > >> We also have a case in which we tried to block ipoib completely with > >> iptables, but we still could lctl ping, even after rebooting the host > >> and ensuring that the firewall was up before loading the lnet module. > >> So, I doubt that ipoib is needed at all for establishing the o2ib > >> connection. > >> > >> > > >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
