On May 15, 2007 13:03 -0400, pauln wrote:
> I've attached a spreadsheet containing data from a lustre create test
> which I ran some time ago. The purpose of the test was to determine how
> different hardware configs affected create performance. As you'll see
> from the data, the ost is actually the slowest component in the create
> chain. I tested several OST and MDS configs and found that every
> disk-based OST configuration was susceptible to lengthy operation times
> interspersed throughout the test. This periodic slowness was correlated
> with disk activity on the OST - at the time I suspected that the
> activity was on behalf of the journal. Moving the entire OST onto a
> ramdisk increased the performance substantially.
Paul,
what version of Lustre were you testing? How large are the ext3 inodes on
the OSTs (can be seen with "dumpe2fs -h /dev/{ostdev}")? What is the
default stripe count?
If you are running 1.4.6 and the ext3 inode size is 128 bytes then there
can be a significant performance hit due to extra metadata being stored
on the OSTs. This is not an issue with filesystems using a newer Lustre.
> What I did not try was moving only the journal into a ramdisk.. it's
> possible that this will decrease the frequency of the OST's slow
> operations. If that is the case, you may be able to increase your
> create performance by purchasing a solid-state device for your ost
> journals.
>
> There are also some numbers for IB (openibnld) included in the
> spreadsheet. I found that using IB lowered the operation time from 100
> - 200 usecs. So it's true that switching to IB will speed things up.
>
> I've also attached the raw data from Test1 (config1 and config13). Each
> line of raw data is the operation latency in seconds, operation number
> == line number.
> paul
>
>
> Daniel Leaberry wrote:
> >I've closely followed the metadata mailing list posts over the last
> >year. We've been running our small filesystem for a couple of months
> >in semi-production mode. We don't have a traditional HPC workload
> >(it's big image files with 5-10 small xml files) and we knew that
> >lustre didn't excel at small files.
> >
> >I ordered the beefiest MDS I could (quad proc dual core opterons with
> >16GB ram) and put it on the fastest array I could (14 drive raid 10
> >with 15k rpm disks). Still, as always, I'm wondering if I can do better.
> >
> >Everything runs over tcp/ip with no jumbo frames. My standard test is
> >to simply track how many opens we do every 10 seconds. I run the
> >following command and keep track of the results. We never really
> >exceed ~2000 opens/second. Our workload often involves downloading
> >~50000 small (4-10k) xml files as fast as possible.
> >
> >I'm just interested in what other lustre gurus have to say about my
> >results. I've tested different lru_size amounts (makes little
> >difference) and portals debug is off. My understanding is that the
> >biggest performance increase I would see is moving to infiniband
> >instead of tcp interconnects.
> >
> >Thanks,
> >[EMAIL PROTECTED] lustre]# echo 0 >
> >/proc/fs/lustre/mds/lustre1-MDT0000/stats; sleep 10; cat
> >/proc/fs/lustre/mds/lustre1-MDT0000/stats
> >snapshot_time 1179180513.905326 secs.usecs
> >open 14948 samples [reqs]
> >close 7456 samples [reqs]
> >mknod 0 samples [reqs]
> >link 0 samples [reqs]
> >unlink 110 samples [reqs]
> >mkdir 0 samples [reqs]
> >rmdir 0 samples [reqs]
> >rename 99 samples [reqs]
> >getxattr 0 samples [reqs]
> >setxattr 0 samples [reqs]
> >iocontrol 0 samples [reqs]
> >get_info 0 samples [reqs]
> >set_info_async 0 samples [reqs]
> >attach 0 samples [reqs]
> >detach 0 samples [reqs]
> >setup 0 samples [reqs]
> >precleanup 0 samples [reqs]
> >cleanup 0 samples [reqs]
> >process_config 0 samples [reqs]
> >postrecov 0 samples [reqs]
> >add_conn 0 samples [reqs]
> >del_conn 0 samples [reqs]
> >connect 0 samples [reqs]
> >reconnect 0 samples [reqs]
> >disconnect 0 samples [reqs]
> >statfs 27 samples [reqs]
> >statfs_async 0 samples [reqs]
> >packmd 0 samples [reqs]
> >unpackmd 0 samples [reqs]
> >checkmd 0 samples [reqs]
> >preallocate 0 samples [reqs]
> >create 0 samples [reqs]
> >destroy 0 samples [reqs]
> >setattr 389 samples [reqs]
> >setattr_async 0 samples [reqs]
> >getattr 3467 samples [reqs]
> >getattr_async 0 samples [reqs]
> >brw 0 samples [reqs]
> >brw_async 0 samples [reqs]
> >prep_async_page 0 samples [reqs]
> >queue_async_io 0 samples [reqs]
> >queue_group_io 0 samples [reqs]
> >trigger_group_io 0 samples [reqs]
> >set_async_flags 0 samples [reqs]
> >teardown_async_page 0 samples [reqs]
> >merge_lvb 0 samples [reqs]
> >adjust_kms 0 samples [reqs]
> >punch 0 samples [reqs]
> >sync 0 samples [reqs]
> >migrate 0 samples [reqs]
> >copy 0 samples [reqs]
> >iterate 0 samples [reqs]
> >preprw 0 samples [reqs]
> >commitrw 0 samples [reqs]
> >enqueue 0 samples [reqs]
> >match 0 samples [reqs]
> >change_cbdata 0 samples [reqs]
> >cancel 0 samples [reqs]
> >cancel_unused 0 samples [reqs]
> >join_lru 0 samples [reqs]
> >init_export 0 samples [reqs]
> >destroy_export 0 samples [reqs]
> >extent_calc 0 samples [reqs]
> >llog_init 0 samples [reqs]
> >llog_finish 0 samples [reqs]
> >pin 0 samples [reqs]
> >unpin 0 samples [reqs]
> >import_event 0 samples [reqs]
> >notify 0 samples [reqs]
> >health_check 0 samples [reqs]
> >quotacheck 0 samples [reqs]
> >quotactl 0 samples [reqs]
> >ping 0 samples [reqs]
> >
> >
>
> _______________________________________________
> Lustre-discuss mailing list
> [email protected]
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss