ok, thanks I will try today. Your Yu
Cory Spitz <[email protected]> 于2018年6月30日周六 上午12:14写道: > FYI, there is a helpful guide to LNet setup at > http://wiki.lustre.org/LNet_Router_Config_Guide. Despite the title, it > is applicable to non-routed cases as well. > -Cory > > -- > > On 6/29/18, 1:06 AM, "lustre-discuss on behalf of Andreas Dilger" < > [email protected] on behalf of [email protected]> > wrote: > > On Jun 28, 2018, at 21:14, yu sun <[email protected]> wrote: > > > > all server and client that fore-mentioned is using netmasks > 255.255.255.224. and they can ping with each other, for example: > > > > [email protected]:~$ ping node28 > > PING node28 (10.82.143.202) 56(84) bytes of data. > > 64 bytes from node28 (10.82.143.202): icmp_seq=1 ttl=61 time=0.047 ms > > 64 bytes from node28 (10.82.143.202): icmp_seq=2 ttl=61 time=0.028 ms > > > > --- node28 ping statistics --- > > 2 packets transmitted, 2 received, 0% packet loss, time 999ms > > rtt min/avg/max/mdev = 0.028/0.037/0.047/0.011 ms > > [email protected]:~$ lctl ping node28@o2ib1 > > failed to ping 10.82.143.202@o2ib1: Input/output error > > [email protected]:~$ > > > > and we also have hundreds of GPU machines with different IP > Subnet, they are in service and it's difficulty to change the network > structure. so any material or document can guide me solve this by don't > change network structure. > > The regular IP "ping" is being routed by an IP router, but that doesn't > work with IB networks, AFAIK. The IB interfaces need to be on the same > subnet, you need to have an IB interface on each subnet configured on > each subnet (which might get ugly if you have a large number of > subnets) > or you need to use LNet routers that are connected to each IB subnet to > do the routing (each subnet would be a separate LNet network, for > example > 10.82.142.202@o2ib23 or whatever). > > The other option would be to use the IPoIB layer with socklnd (e.g. > 10.82.142.202@tcp) but this would not run as fast as native verbs. > > Cheers, Andreas > > > > Mohr Jr, Richard Frank (Rick Mohr) <[email protected]> 于2018年6月29日周五 > 上午3:30写道: > > > > > On Jun 27, 2018, at 4:44 PM, Mohr Jr, Richard Frank (Rick Mohr) < > [email protected]> wrote: > > > > > > > > >> On Jun 27, 2018, at 3:12 AM, yu sun <[email protected]> wrote: > > >> > > >> client: > > >> [email protected]:~$ mount -t lustre node28@o2ib1 > :node29@o2ib1:/project /mnt/lustre_data > > >> mount.lustre: mount node28@o2ib1:node29@o2ib1:/project at > /mnt/lustre_data failed: Input/output error > > >> Is the MGS running? > > >> [email protected]:~$ lctl ping node28@o2ib1 > > >> failed to ping 10.82.143.202@o2ib1: Input/output error > > >> [email protected]:~$ > > > > > > In your previous email, you said that you could mount lustre on > the client ml-gpu-ser200.nmg01. Was that not accurate, or did something > change in the meantime? > > > > (Note: Received out-of-band reply from Yu stating that there was a > typo in the previous email, and that client ml-gpu-ser200.nmg01 could not > mount lustre. Continuing discussion here so others on list can > follow/benefit.) > > > > Yu, > > > > For the IPoIB addresses used on your nodes, what are the subnets > (and netmasks) that you are using? It looks like servers use 10.82.143.X > and clients use 10.82.141.X. If you are using a 255.255.0.0 netmask, you > should be fine. But if you are using 255.255.255.0, then you will run into > problems. Lustre expects that all nodes on the same lnet network (o2ib1 in > your case) will also be on the same IP subnet. > > > > Have you tried running a regular “ping <IPoIB_address>” command > between clients and servers to make sure that part is working? > > > > -- > > Rick Mohr > > Senior HPC System Administrator > > National Institute for Computational Sciences > > http://www.nics.tennessee.edu > > > > _______________________________________________ > > lustre-discuss mailing list > > [email protected] > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > Cheers, Andreas > --- > Andreas Dilger > Principal Lustre Architect > Whamcloud > > > > > > > > > >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
