Dear Richard,

Thank you very much for the reply (somehow my email filter has eaten it so I 
knew it only from Mark's quotation). Yes, it seems that your analysis is 
explaining our situation. We do see it when there is predominantly "reading" 
activity on one of the OSTs. Actually the volume of reading was small, thats 
why we couldn't even locate an application that does it. It can be explained 
then that tehre  was really a blocking sitoation, not a throughput problem.


The way our system is configured, number of OSTs is small (13). We have zero 
load on MDS, and stripe count 1.  The system is running and about 60% full; I 
wonder how would be a best strategy to change the striping now. I understand 
that if I just change the stripe count on the Lustre root dir, it will affect 
only newly created files/directories. Should I copy the user's files, stripe 
their directories and then copy the data back? That sounds somewhat dangerous, 
especially if the users do some unusual things with symlinks..

 -- 
Grigory Shamov
HPC Analyst, Westgrid/Compute Canada
E2-588 EITC Building, University of Manitoba
(204) 474-9625


--- On Fri, 12/7/12, Mark Day <[email protected]> wrote:

From: Mark Day <[email protected]>
Subject: Re: [Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?
To: "Mohr Jr, Richard Frank (Rick Mohr)" <[email protected]>
Cc: [email protected], "Grigory Shamov" <[email protected]>
Date: Friday, December 7, 2012, 4:22 PM

#yiv2002087058 p {margin:0;}> 2) Make sure caching is enabled on the oss.

How do you check/enable for this? Is it not enabled by default?

Cheers, Mark

From: "Mohr Jr, Richard Frank (Rick Mohr)" <[email protected]>
To: "Grigory Shamov" <[email protected]>
Cc: [email protected]
Sent: Saturday, 8 December, 2012 5:19:31 AM
Subject: Re: [Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

On Dec 6, 2012, at 2:58 PM, Grigory Shamov wrote:

> So, on one of our OSS servers the load is now 160. According to collectl, 
> only one OST does most of the job. (We dont do striping on this FS; unless 
> users to it manually on their subdirectories).

This sounds similar to situations we see every now and then.  The load on the 
oss server climbs until it is roughly equally to the number of oss threads 
(which sounds like your case with load=oss_threads=160), but only a single ost 
is performing any significant IO.  This seems to arise when parallel jobs 
access the same file which has stripe_count=1.  The oss is bombarded with so 
many requests to a single ost that they backlog and tie up all the oss threads. 
 At that point, all IO to the oss slows to a crawl no matter which ost on the 
oss is being used.  This becomes problematic because even a modest sized job 
can effectively DOS and oss server.

When you encounter these problems, is the IO to the affected ost primarly 
one-way (ie - mostly reads or mostly writes)?  In our cases, we tend to see 
this when parallel jobs are reading from a common file.  There are a couple of 
things that I have found that help:

1) Increase the file striping a lot.  This helps spread the load over more 
osts.  We have had success with striping even relatively small files (~10 GB) 
over 100+ osts.  Not only does it reduce load on the oss, but it usually speeds 
up the application significantly.

2) Make sure caching is enabled on the oss.  For us, this seems to help mostly 
when lots of processes are reading in the same file.

Not sure if your situation is exactly like what I have seen, but maybe some of 
that info can help a bit.

-- 
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu


_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to