On 2010-12-10, at 12:42, Brock Palen wrote:
> We have an lustre 1.6.x filesystem,
> 
> 4 OSS,  3 x4500 and 1 ddn s2a6620
> 
> Each oss has 4 1gig interfaces bonded, or 1 10gig interface.
> 
> I have a user who is running a few hundred serial jobs that are all accessing 
> the same 16GB file, we striped the file over all the osts, and are tapped at 
> 500-600MB/s no mater the number of hosts running.   IO per OST is around 
> 15-20MB/s  (31 total ost's) 

How big is the IO size?  Are all the clients both reading and writing this same 
file?  Presumably you see better performance when so many jobs are not running 
against the filesystem?

> This set of jobs keeps reading in the same data set, and has been running for 
> about 24 hours (the group of about 900 total jobs).
> 
> *  Is there a recommendation of a better way to do these sorts of jobs?  The 
> compute nodes have 48GB of ram, he does not use much ram for the job just all 
> the IO.

I agree with Cliff that the 1.8 OSS read cache will probably help the 
performance in this case.  OSS read cache does not need a client-side upgrade 
to work, though of course I'd suggest upgrading the clients anyway.

1.8.5 was just released this week...

> * Is there a better way to tune?  What should I be looking for to tune?

Start by looking at /proc/fs/lustre/obdfilter/*/brw_stats on the OSTs.  It 
should be reset before the job (echo 0 to each file) so you get stats relevant 
to that job only.  You can also check iostat on the OSS nodes to see how busy 
the disks are.  They may be imbalanced due to being different hardware, and 
will only go as fast as the slowest OSTs.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to