Hi All
OS=Redhat 7.4
Lustre Version: Intel® Manager for Lustre* software 4.0.3.0

I have 72 osts over 6 oss with HA and 1 mdt serving to 195 clients over
infiniband EDR.

After a reboot on client, lustre filesystem mounts on startup. It should be
2.1 TB area but lt starts with 350TB.

lfs osts command shows 90 percent of even numbered osts are ACTIVE and no
information about other OSTs, as time passes like 1 hour or so, all OSTs
become active and lustre area can be seen as 2.1 PB


dmesg on lustre oss server:
LustreError: 137-5: lustre-OST0009_UUID: not available for connect from
10.0.0.130@o2ib (no target). If you are running an HA pair check that the
target is mounted on the other server.

dmesg on client:
LNet: 5419:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for
10.0.0.5@o2ib: 15 seconds
Lustre: 5546:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent
has failed due to network error: [sent 1542009416/real 1542009426]
req@ffff885f47610000 x1616909446641136/t0(0)
o8->[email protected]@o2ib:28/4 lens 520/544 e 0
to 1 dl 1542009696 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1

I tested infiniband with ib_send_lat, ib_read_lat and no error occured
I tested lnet ping with lctl ping 10.0.0.8@o2ib , no error occured
12345-0@lo
12345-10.51.22.8@o2ib

Why some OST's  can be accesible while some are not in the same server?
Best Regards.
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to