Good Morning,

   I'm in the early phases of a new lustre file system design.  One thing we 
are observing on our lustre file system from 2019 (running 2.12.9-1) is that 
the oss systems report a lot of time spent in i/o wait.  We are configured with 
two oss systems connected via sas cables to a pair of JBODS, and are wondering 
if going from two JBODs per OSS (so 6 osts per oss under normal operating 
conditions, 12 under failover) to 4 shelves (12 osts per oss under normal 
operating conditions, 24 under failover).  The OSTs are 10 raidz2 vdevs, and we 
are planning on using 20TB drives in this new file system.

Has anyone tried the 4 shelf / 2 oss configuration?

Reading through the Lustre manual I see the following under table 1.2

"OSS: 1-128 TiB per OST, 1-8 OSTs per OSS"

Is that an indication that more than 8 OSTs per OSS causes problems for the OSS 
systems?  Our current OSS systems have run at 12 OSTs during failover 
situations, once for at least a few days due to a hardware failure on one of 
the OSS systems.

respectfully,

Kurt J. Strosahl (he/him)
System Administrator: Lustre, HPC
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to