> On Jun 27, 2018, at 4:44 PM, Mohr Jr, Richard Frank (Rick Mohr) 
> <[email protected]> wrote:
> 
> 
>> On Jun 27, 2018, at 3:12 AM, yu sun <[email protected]> wrote:
>> 
>> client:
>> [email protected]:~$ mount -t lustre 
>> node28@o2ib1:node29@o2ib1:/project /mnt/lustre_data
>> mount.lustre: mount node28@o2ib1:node29@o2ib1:/project at /mnt/lustre_data 
>> failed: Input/output error
>> Is the MGS running?
>> [email protected]:~$ lctl ping node28@o2ib1
>> failed to ping 10.82.143.202@o2ib1: Input/output error
>> [email protected]:~$
> 
> In your previous email, you said that you could mount lustre on the client 
> ml-gpu-ser200.nmg01.  Was that not accurate, or did something change in the 
> meantime?

(Note: Received out-of-band reply from Yu stating that there was a typo in the 
previous email, and that client ml-gpu-ser200.nmg01 could not mount lustre.  
Continuing discussion here so others on list can follow/benefit.)

Yu,

For the IPoIB addresses used on your nodes, what are the subnets (and netmasks) 
that you are using?  It looks like servers use 10.82.143.X and clients use 
10.82.141.X.  If you are using a 255.255.0.0 netmask, you should be fine.  But 
if you are using 255.255.255.0, then you will run into problems.  Lustre 
expects that all nodes on the same lnet network (o2ib1 in your case) will also 
be on the same IP subnet.

Have you tried running a regular “ping <IPoIB_address>” command between clients 
and servers to make sure that part is working?

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to