It will continue downward as the number of files in the directory increase. Interestingly, GPFS stat performance increased as the number of files increased. My tests were on 128 nodes * 8 processes/node * 10 - 500 files per process.
- Richard On 9/10/10 11:11 AM, "Michael Robbert" <[email protected]> wrote: > We have been struggling with our Lustre performance for some time now > especially with large directories. I recently did some informal benchmarking > (on a live system so I know results are not scientifically valid) and noticed > a huge drop in performance of reads(stat operations) past 20k files in a > single directory. I'm using bonnie++, disabling IO testing (-s 0) and just > creating, reading, and deleting 40kb files in a single directory. I've done > this on for directory sizes of 2,000 to 40,000 files. Create performance is a > flat line of ~150 files/sec across the board. Delete performance is all over > the place, but no higher than 3,000 files/sec. The really interesting data > point is read performance, which for these tests is just a stat of the file > not reading data. Starting with the smaller directories it is relatively > consistent at just below 2,500 files/sec, but when I jump from 20,000 files to > 30,000 files the performance drops to around 100 files/sec. We were assuming > this w > as somewhat expected behavior and are in the process of trying to get our > users to change their code. Then yesterday I was browsing the Lustre > Operations Manual and found section 33.8 that says Lustre is tested with > directories as large as 10 million files in a single directory and still get > lookups at a rate of 5,000 files/sec. That leaves me wondering 2 things. How > can we get 5,000 files/sec for anything and why is our performance dropping > off so suddenly at after 20k files? > > Here is our setup: > All IO servers are Dell PowerEdge 2950s. 2 8-core sockets with X5355 @ > 2.66GHz and 16Gb of RAM. > The data is on DDN S2A 9550s with 8+2 RAID configuration connected directly > with 4Gb Fibre channel. > They are running RHEL 4.5, Lustre 6.7.2-ddn3, kernel > 2.6.18-128.7.1.el5.ddn1.l1.6.7.2.ddn3smp > > As a side note the users code is Parflow, developed at LLNL. The files are > SILO files. We have as many as 1.4 million files in a single directory and we > now have half a billion files that we need to deal with in one way or another. > The code has already been modified to split the files on newer runs until > multiple subdirectories, but we're still dealing with 10s of thousands of > files in a single directory. The users have been able to run these data sets > on Lustre systems at LLNL 3 orders of magnitude faster. > > Thanks, > Mike Robbert > HPC & Networking Engineer > Colorado School of Mines > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://*lists.lustre.org/mailman/listinfo/lustre-discuss > ==================================================== Richard Hedges Customer Support and Test - File Systems Project Development Environment Group - Livermore Computing Lawrence Livermore National Laboratory 7000 East Avenue, MS L-557 Livermore, CA 94551 v: (925) 423-2699 f: (925) 423-6961 E: [email protected] _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
