Re: [Lustre-discuss] Metadata performance (again)

pauln Tue, 15 May 2007 13:09:43 -0700

Andreas Dilger wrote:

On May 15, 2007  13:03 -0400, pauln wrote:
I've attached a spreadsheet containing data from a lustre create testwhich I ran some time ago. The purpose of the test was to determine howdifferent hardware configs affected create performance. As you'll seefrom the data, the ost is actually the slowest component in the createchain. I tested several OST and MDS configs and found that everydisk-based OST configuration was susceptible to lengthy operation timesinterspersed throughout the test. This periodic slowness was correlatedwith disk activity on the OST - at the time I suspected that theactivity was on behalf of the journal. Moving the entire OST onto aramdisk increased the performance substantially.
Paul,
what version of Lustre were you testing?  How large are the ext3 inodes on
the OSTs (can be seen with "dumpe2fs -h /dev/{ostdev}")?  What is the
default stripe count?

If you are running 1.4.6 and the ext3 inode size is 128 bytes then there
can be a significant performance hit due to extra metadata being stored
on the OSTs.  This is not an issue with filesystems using a newer Lustre.

The lustre version was 1.4.6.1 (rhel kernel 2.6.9-34). I used thedefault inode size and only had 1 OST. Can you briefly describe theproblems with 128 byte inodes and suggest a more optimal size?

thanks,
paul

What I did not try was moving only the journal into a ramdisk.. it'spossible that this will decrease the frequency of the OST's slowoperations. If that is the case, you may be able to increase yourcreate performance by purchasing a solid-state device for your ostjournals.There are also some numbers for IB (openibnld) included in thespreadsheet. I found that using IB lowered the operation time from 100- 200 usecs. So it's true that switching to IB will speed things up.
I've also attached the raw data from Test1 (config1 and config13). Eachline of raw data is the operation latency in seconds, operation number== line number.
paul


Daniel Leaberry wrote:
I've closely followed the metadata mailing list posts over the lastyear. We've been running our small filesystem for a couple of monthsin semi-production mode. We don't have a traditional HPC workload(it's big image files with 5-10 small xml files) and we knew thatlustre didn't excel at small files.
I ordered the beefiest MDS I could (quad proc dual core opterons with16GB ram) and put it on the fastest array I could (14 drive raid 10with 15k rpm disks). Still, as always, I'm wondering if I can do better.
Everything runs over tcp/ip with no jumbo frames. My standard test isto simply track how many opens we do every 10 seconds. I run thefollowing command and keep track of the results. We never reallyexceed ~2000 opens/second. Our workload often involves downloading~50000 small (4-10k) xml files as fast as possible.
I'm just interested in what other lustre gurus have to say about myresults. I've tested different lru_size amounts (makes littledifference) and portals debug is off. My understanding is that thebiggest performance increase I would see is moving to infinibandinstead of tcp interconnects.
Thanks,
[EMAIL PROTECTED] lustre]# echo 0 >/proc/fs/lustre/mds/lustre1-MDT0000/stats; sleep 10; cat/proc/fs/lustre/mds/lustre1-MDT0000/stats
snapshot_time             1179180513.905326 secs.usecs
open                      14948 samples [reqs]
close                     7456 samples [reqs]
mknod                     0 samples [reqs]
link                      0 samples [reqs]
unlink                    110 samples [reqs]
mkdir                     0 samples [reqs]
rmdir                     0 samples [reqs]
rename                    99 samples [reqs]
getxattr                  0 samples [reqs]
setxattr                  0 samples [reqs]
iocontrol                 0 samples [reqs]
get_info                  0 samples [reqs]
set_info_async            0 samples [reqs]
attach                    0 samples [reqs]
detach                    0 samples [reqs]
setup                     0 samples [reqs]
precleanup                0 samples [reqs]
cleanup                   0 samples [reqs]
process_config            0 samples [reqs]
postrecov                 0 samples [reqs]
add_conn                  0 samples [reqs]
del_conn                  0 samples [reqs]
connect                   0 samples [reqs]
reconnect                 0 samples [reqs]
disconnect                0 samples [reqs]
statfs                    27 samples [reqs]
statfs_async              0 samples [reqs]
packmd                    0 samples [reqs]
unpackmd                  0 samples [reqs]
checkmd                   0 samples [reqs]
preallocate               0 samples [reqs]
create                    0 samples [reqs]
destroy                   0 samples [reqs]
setattr                   389 samples [reqs]
setattr_async             0 samples [reqs]
getattr                   3467 samples [reqs]
getattr_async             0 samples [reqs]
brw                       0 samples [reqs]
brw_async                 0 samples [reqs]
prep_async_page           0 samples [reqs]
queue_async_io            0 samples [reqs]
queue_group_io            0 samples [reqs]
trigger_group_io          0 samples [reqs]
set_async_flags           0 samples [reqs]
teardown_async_page       0 samples [reqs]
merge_lvb                 0 samples [reqs]
adjust_kms                0 samples [reqs]
punch                     0 samples [reqs]
sync                      0 samples [reqs]
migrate                   0 samples [reqs]
copy                      0 samples [reqs]
iterate                   0 samples [reqs]
preprw                    0 samples [reqs]
commitrw                  0 samples [reqs]
enqueue                   0 samples [reqs]
match                     0 samples [reqs]
change_cbdata             0 samples [reqs]
cancel                    0 samples [reqs]
cancel_unused             0 samples [reqs]
join_lru                  0 samples [reqs]
init_export               0 samples [reqs]
destroy_export            0 samples [reqs]
extent_calc               0 samples [reqs]
llog_init                 0 samples [reqs]
llog_finish               0 samples [reqs]
pin                       0 samples [reqs]
unpin                     0 samples [reqs]
import_event              0 samples [reqs]
notify                    0 samples [reqs]
health_check              0 samples [reqs]
quotacheck                0 samples [reqs]
quotactl                  0 samples [reqs]
ping                      0 samples [reqs]
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss


_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Metadata performance (again)

Reply via email to