I'm having trouble diagnosing where the problem lies in  my Lustre
installation, clients are 2.12.6 and I have a /home and /lustre
filesystems using Lustre.

/home has 4 OSTs and /lustre is made up of 6 OSTs. lfs df shows all OSTs as
ACTIVE.

The /lustre file system appears fine, I can *ls *into every directory.

When people log into the login node, it appears to lockup. I have shut down
everything and remounted the OSTs and MDTs etc in order with no
errors reporting but I'm getting the lockup issue soon after a few people
log in.
The backend network is 100G Ethernet using ConnectX5 cards and the OS is
Cento 7.9, everything was installed as RPMs and updates are disabled in
yum.conf

Two questions to start with:
Is there a command line tool to check each OST individually?
Apart from /var/log/messages, is there a lustre specific log I can monitor
on the login node to see errors when I hit /home...



Sid Young
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to