Greetings Megan, One scenario that could cause this is if your appliance-style Lustre MDS is a high-availability server pair and your mount command is not declaring both NIDs in the mount command *and* the MGS and MDT resources happen to be presently residing on the MDS server you are not declaring in your mount command.
If it is high-availability and the IPs of those servers is A.B.C.D and A.B.C.E then make sure your command command looks something like: mount -t lustre A.B.C.D@tcp:A.B.C.E@tcp:/somefsname /localmountpoint That way the client will be looking for the MGS in all of the places it *could* be located. Just one possibility of what may be the cause. Certainly easier and less painful than a lower level version compatibility issue. —Jeff On Wed, Feb 28, 2018 at 13:36 Ms. Megan Larko <dobsonu...@gmail.com> wrote: > Greetings List! > > We have been continuing to dissect our LNet environment between our > lustre-2.7.0 clients and the lustre-2.7.18 servers. We have moved from the > client node to the LNet server which bridges the InfiniBand (IB) and > ethernet networks. As a test, we attempted to mount the ethernet Lustre > storage from the LNet hopefully taking the IB out of the equation to limit > the scope of our debugging. > > On the LNet router the attempted mount of Lustre storage fails. The LNet > command line error on the test LNet client is exactly the same as the > original client result: > mount A.B.C.D@tcp0:/lustre at /mnt/lustre failed: Input/output error Is > the MGS running? > > On the lustre servers, both the MGS/MDS and OSS we can see the error via > dmesg: > LNet: There was an unexpected network error while writing to C.D.E.F: -110 > > and we see the periodic (~ every 10 to 20 minutes) in dmesg on MGS/MDS: > Lustre: MGS: Client <id string> (at C.D.E.F@tcp) reconnecting > > The "lctl pings" in various directions are still successful. > > So, forget the end lustre client, we are not yet getting from MGS/MDS > sucessfully to the LNet router. > We have been looking at the contents of /sys/module/lustre.conf and we are > not seeing any differences in set values between the LNet router we are > using as a test Lustre client and the Lustre MGS/MDS server. > > As much as I'd _love_ to go to Lustre-2.10.x, we are dealing with both > "appliance" style Lustre storage systems and clients tied to specific > versions of the linux kernel (for reasons other than Lustre). > > Is there a key parameter which I could still be overlooking? > > Cheers, > megan > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- ------------------------------ Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org