RE: [Lustre-discuss] Metadata performance (again)

Oral, H. Sarp Tue, 15 May 2007 12:42:32 -0700

Daniel,

We had similar findings with PSC. Moving from TCP to IB (OFED 1.1)
client-to-server connections almost doubled the metadata ops performance
for single client cases.


Sarp Oral

-----------------

Sarp Oral, Ph.D.
NCCS ORNL
865-574-2173



-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of pauln
Sent: Tuesday, May 15, 2007 1:04 PM
To: Daniel Leaberry
Cc: [email protected]
Subject: Re: [Lustre-discuss] Metadata performance (again)

Daniel,
I've attached a spreadsheet containing data from a lustre create test 
which I ran some time ago.  The purpose of the test was to determine how

different hardware configs affected create performance.  As you'll see 
from the data, the ost is actually the slowest component in the create 
chain.  I tested several OST and MDS configs and found that every 
disk-based OST configuration was susceptible to lengthy operation times 
interspersed throughout the test.  This periodic slowness was correlated

with disk activity on the OST - at the time I suspected that the 
activity was on behalf of the journal.  Moving the entire OST onto a 
ramdisk increased the performance substantially. 

What I did not try was moving only the journal into a ramdisk.. it's 
possible that this will decrease the frequency of the OST's slow 
operations.  If that is the case, you may be able to increase your 
create performance by purchasing a solid-state device for your ost 
journals. 

There are also some numbers for IB (openibnld) included in the 
spreadsheet.  I found that using IB lowered the operation time from 100 
- 200 usecs.  So it's true that switching to IB will speed things up.

I've also attached the raw data from Test1 (config1 and config13).  Each

line of raw data is the operation latency in seconds, operation number 
== line number.
paul


Daniel Leaberry wrote:
> I've closely followed the metadata mailing list posts over the last 
> year. We've been running our small filesystem for a couple of months 
> in semi-production mode. We don't have a traditional HPC workload 
> (it's big image files with 5-10 small xml files) and we knew that 
> lustre didn't excel at small files.
>
> I ordered the beefiest MDS I could (quad proc dual core opterons with 
> 16GB ram) and put it on the fastest array I could (14 drive raid 10 
> with 15k rpm disks). Still, as always, I'm wondering if I can do
better.
>
> Everything runs over tcp/ip with no jumbo frames. My standard test is 
> to simply track how many opens we do every 10 seconds. I run the 
> following command and keep track of the results. We never really 
> exceed ~2000 opens/second. Our workload often involves downloading 
> ~50000 small (4-10k) xml files as fast as possible.
>
> I'm just interested in what other lustre gurus have to say about my 
> results. I've tested different lru_size amounts (makes little 
> difference) and portals debug is off. My understanding is that the 
> biggest performance increase I would see is moving to infiniband 
> instead of tcp interconnects.
>
> Thanks,
> [EMAIL PROTECTED] lustre]# echo 0 > 
> /proc/fs/lustre/mds/lustre1-MDT0000/stats; sleep 10; cat 
> /proc/fs/lustre/mds/lustre1-MDT0000/stats
> snapshot_time             1179180513.905326 secs.usecs
> open                      14948 samples [reqs]
> close                     7456 samples [reqs]
> mknod                     0 samples [reqs]
> link                      0 samples [reqs]
> unlink                    110 samples [reqs]
> mkdir                     0 samples [reqs]
> rmdir                     0 samples [reqs]
> rename                    99 samples [reqs]
> getxattr                  0 samples [reqs]
> setxattr                  0 samples [reqs]
> iocontrol                 0 samples [reqs]
> get_info                  0 samples [reqs]
> set_info_async            0 samples [reqs]
> attach                    0 samples [reqs]
> detach                    0 samples [reqs]
> setup                     0 samples [reqs]
> precleanup                0 samples [reqs]
> cleanup                   0 samples [reqs]
> process_config            0 samples [reqs]
> postrecov                 0 samples [reqs]
> add_conn                  0 samples [reqs]
> del_conn                  0 samples [reqs]
> connect                   0 samples [reqs]
> reconnect                 0 samples [reqs]
> disconnect                0 samples [reqs]
> statfs                    27 samples [reqs]
> statfs_async              0 samples [reqs]
> packmd                    0 samples [reqs]
> unpackmd                  0 samples [reqs]
> checkmd                   0 samples [reqs]
> preallocate               0 samples [reqs]
> create                    0 samples [reqs]
> destroy                   0 samples [reqs]
> setattr                   389 samples [reqs]
> setattr_async             0 samples [reqs]
> getattr                   3467 samples [reqs]
> getattr_async             0 samples [reqs]
> brw                       0 samples [reqs]
> brw_async                 0 samples [reqs]
> prep_async_page           0 samples [reqs]
> queue_async_io            0 samples [reqs]
> queue_group_io            0 samples [reqs]
> trigger_group_io          0 samples [reqs]
> set_async_flags           0 samples [reqs]
> teardown_async_page       0 samples [reqs]
> merge_lvb                 0 samples [reqs]
> adjust_kms                0 samples [reqs]
> punch                     0 samples [reqs]
> sync                      0 samples [reqs]
> migrate                   0 samples [reqs]
> copy                      0 samples [reqs]
> iterate                   0 samples [reqs]
> preprw                    0 samples [reqs]
> commitrw                  0 samples [reqs]
> enqueue                   0 samples [reqs]
> match                     0 samples [reqs]
> change_cbdata             0 samples [reqs]
> cancel                    0 samples [reqs]
> cancel_unused             0 samples [reqs]
> join_lru                  0 samples [reqs]
> init_export               0 samples [reqs]
> destroy_export            0 samples [reqs]
> extent_calc               0 samples [reqs]
> llog_init                 0 samples [reqs]
> llog_finish               0 samples [reqs]
> pin                       0 samples [reqs]
> unpin                     0 samples [reqs]
> import_event              0 samples [reqs]
> notify                    0 samples [reqs]
> health_check              0 samples [reqs]
> quotacheck                0 samples [reqs]
> quotactl                  0 samples [reqs]
> ping                      0 samples [reqs]
>
>

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

RE: [Lustre-discuss] Metadata performance (again)

Reply via email to