Re: [Lustre-discuss] Metadata performance (again)

Andreas Dilger Tue, 15 May 2007 12:52:08 -0700

On May 15, 2007  13:03 -0400, pauln wrote:
> I've attached a spreadsheet containing data from a lustre create test 
> which I ran some time ago.  The purpose of the test was to determine how 
> different hardware configs affected create performance.  As you'll see 
> from the data, the ost is actually the slowest component in the create 
> chain.  I tested several OST and MDS configs and found that every 
> disk-based OST configuration was susceptible to lengthy operation times 
> interspersed throughout the test.  This periodic slowness was correlated 
> with disk activity on the OST - at the time I suspected that the 
> activity was on behalf of the journal.  Moving the entire OST onto a 
> ramdisk increased the performance substantially.


Paul,
what version of Lustre were you testing?  How large are the ext3 inodes on
the OSTs (can be seen with "dumpe2fs -h /dev/{ostdev}")?  What is the
default stripe count?

If you are running 1.4.6 and the ext3 inode size is 128 bytes then there
can be a significant performance hit due to extra metadata being stored
on the OSTs.  This is not an issue with filesystems using a newer Lustre.

> What I did not try was moving only the journal into a ramdisk.. it's 
> possible that this will decrease the frequency of the OST's slow 
> operations.  If that is the case, you may be able to increase your 
> create performance by purchasing a solid-state device for your ost 
> journals. 
> 
> There are also some numbers for IB (openibnld) included in the 
> spreadsheet.  I found that using IB lowered the operation time from 100 
> - 200 usecs.  So it's true that switching to IB will speed things up.
> 
> I've also attached the raw data from Test1 (config1 and config13).  Each 
> line of raw data is the operation latency in seconds, operation number 
> == line number.
> paul
> 
> 
> Daniel Leaberry wrote:
> >I've closely followed the metadata mailing list posts over the last 
> >year. We've been running our small filesystem for a couple of months 
> >in semi-production mode. We don't have a traditional HPC workload 
> >(it's big image files with 5-10 small xml files) and we knew that 
> >lustre didn't excel at small files.
> >
> >I ordered the beefiest MDS I could (quad proc dual core opterons with 
> >16GB ram) and put it on the fastest array I could (14 drive raid 10 
> >with 15k rpm disks). Still, as always, I'm wondering if I can do better.
> >
> >Everything runs over tcp/ip with no jumbo frames. My standard test is 
> >to simply track how many opens we do every 10 seconds. I run the 
> >following command and keep track of the results. We never really 
> >exceed ~2000 opens/second. Our workload often involves downloading 
> >~50000 small (4-10k) xml files as fast as possible.
> >
> >I'm just interested in what other lustre gurus have to say about my 
> >results. I've tested different lru_size amounts (makes little 
> >difference) and portals debug is off. My understanding is that the 
> >biggest performance increase I would see is moving to infiniband 
> >instead of tcp interconnects.
> >
> >Thanks,
> >[EMAIL PROTECTED] lustre]# echo 0 > 
> >/proc/fs/lustre/mds/lustre1-MDT0000/stats; sleep 10; cat 
> >/proc/fs/lustre/mds/lustre1-MDT0000/stats
> >snapshot_time             1179180513.905326 secs.usecs
> >open                      14948 samples [reqs]
> >close                     7456 samples [reqs]
> >mknod                     0 samples [reqs]
> >link                      0 samples [reqs]
> >unlink                    110 samples [reqs]
> >mkdir                     0 samples [reqs]
> >rmdir                     0 samples [reqs]
> >rename                    99 samples [reqs]
> >getxattr                  0 samples [reqs]
> >setxattr                  0 samples [reqs]
> >iocontrol                 0 samples [reqs]
> >get_info                  0 samples [reqs]
> >set_info_async            0 samples [reqs]
> >attach                    0 samples [reqs]
> >detach                    0 samples [reqs]
> >setup                     0 samples [reqs]
> >precleanup                0 samples [reqs]
> >cleanup                   0 samples [reqs]
> >process_config            0 samples [reqs]
> >postrecov                 0 samples [reqs]
> >add_conn                  0 samples [reqs]
> >del_conn                  0 samples [reqs]
> >connect                   0 samples [reqs]
> >reconnect                 0 samples [reqs]
> >disconnect                0 samples [reqs]
> >statfs                    27 samples [reqs]
> >statfs_async              0 samples [reqs]
> >packmd                    0 samples [reqs]
> >unpackmd                  0 samples [reqs]
> >checkmd                   0 samples [reqs]
> >preallocate               0 samples [reqs]
> >create                    0 samples [reqs]
> >destroy                   0 samples [reqs]
> >setattr                   389 samples [reqs]
> >setattr_async             0 samples [reqs]
> >getattr                   3467 samples [reqs]
> >getattr_async             0 samples [reqs]
> >brw                       0 samples [reqs]
> >brw_async                 0 samples [reqs]
> >prep_async_page           0 samples [reqs]
> >queue_async_io            0 samples [reqs]
> >queue_group_io            0 samples [reqs]
> >trigger_group_io          0 samples [reqs]
> >set_async_flags           0 samples [reqs]
> >teardown_async_page       0 samples [reqs]
> >merge_lvb                 0 samples [reqs]
> >adjust_kms                0 samples [reqs]
> >punch                     0 samples [reqs]
> >sync                      0 samples [reqs]
> >migrate                   0 samples [reqs]
> >copy                      0 samples [reqs]
> >iterate                   0 samples [reqs]
> >preprw                    0 samples [reqs]
> >commitrw                  0 samples [reqs]
> >enqueue                   0 samples [reqs]
> >match                     0 samples [reqs]
> >change_cbdata             0 samples [reqs]
> >cancel                    0 samples [reqs]
> >cancel_unused             0 samples [reqs]
> >join_lru                  0 samples [reqs]
> >init_export               0 samples [reqs]
> >destroy_export            0 samples [reqs]
> >extent_calc               0 samples [reqs]
> >llog_init                 0 samples [reqs]
> >llog_finish               0 samples [reqs]
> >pin                       0 samples [reqs]
> >unpin                     0 samples [reqs]
> >import_event              0 samples [reqs]
> >notify                    0 samples [reqs]
> >health_check              0 samples [reqs]
> >quotacheck                0 samples [reqs]
> >quotactl                  0 samples [reqs]
> >ping                      0 samples [reqs]
> >
> >
> 




> _______________________________________________
> Lustre-discuss mailing list
> [email protected]
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Metadata performance (again)

Reply via email to