Hi Lex, If you are using Lustre for small files, another blog post of mine would be helpful. Have a look at http://blogs.sun.com/atulvid/entry/improving_performance_of_small_files The post includes some tips on improving performance of small files and also has recommendations on tools to use for benchmarking.
Cheers, _Atul Lex wrote: > > > did you measure the performance of this system before lustre? > specifically > > > Tell me exactly what information useful for you to help me diagnose > our problem, plz > > , your symptoms make it look like your disk system > can't handle the load. since you have lots of small activity, > the issue wouldn't be bandwidth, but latency. I've normally only > seen this on the MDS, where metadata traffic can generate quite > high numbers of transactions, even though the bandwidth is low. > > > > for instance, is the MDS volume a slow-write form of raid like > raid5 or raid6? MDS activity is mainly small, synchronous > transactions > such as directory updates, which is why MDS should be on raid10. > > > We use raid10 for our MDS and it's operating quite idle. Below is some > info about load average and network traffic ( output from w and bmon > command ) It isn't too high to make the delay, right ? > > /load average: 0.05, 0.10, 0.09 > > Name RX TX > ──────────────────────────── ┬──────────────────────── > MDS1 (local) │ Rate # % │ > Rate # % > 0 lo │ 0 B 0 > │ 0 B 0 > 1 eth0 │ 22 B 0 │ > 344.59KiB 736 > 2 eth1 │ 670.49KiB 1.37K │ > 267.29KiB 592 > 3 bond0 │ 670.51KiB 1.38K │ > 611.88KiB 1.30K/ > > > are quite a lot small file: a linux soft links ) Files are > "striped" over > > > in a normal filesystem, symlinks are stored in the inode itself, > at least for short symlink targets. I guess that applies to > lustre as well - the symlink would be on the MDS. but there are > issues related to the size of the inode on the MDS, since striping > information is also stored in EAs > which are also hopefully within the file's inode. when there's > too much to > fit into an inode, performance suffers, since the same metadata > operations > now require extra seeks. > > > I will consider this > > > each 2 OSTs, some are striped over all our OSTs ( fewer than 2 > OSTs parallel > striping ) > > > whether it makes sense to stripe over all OSTs or not depends on > the sizes of your files. but since you have only gigabit, it's > probably not a good idea. (that is, accessing a striped file > won't be any faster, since it'll bottleneck on the client's > network port.) > > > could you please tell me in detail the disadvantage of 1 Gig Ethernet > in using lustre and what exactly the bottleneck in client's network > port is ? ( i tried to install more NIC for client and bonded it > together but it didn't help ) > > I found in some paper ( got it from google ) that if we using bonding > devices with 3 x 1 Gig Ethernet, the problem will be significantly > improved. But, in our case, i even couldn't reach the limit of 1 Gig !!! > > > > Do you have any idea for my issue ? > > > I think you need to find out whether the performance problem is merely > due to latency (metadata rate) on the MDS. looking at normal > performance > metrics on the MDS when under load (/proc/partitions, etc) might > be able > to show this. even "vmstat 1" may be informative, to see what > sorts of blocks-per-second IO rates you're getting. > > > Here is output of vmstat 1 in 10 seconds > > /r...@mds1: ~ # vmstat 1 > procs -----------memory---------- ---swap-- -----io---- --system-- > -----cpu------ > r b swpd free buff cache si so bi bo in cs us > sy id wa st > 1 0 140 243968 3314424 432776 0 0 1 6 2 1 0 > 2 97 1 0 > 0 0 140 244092 3314424 432776 0 0 0 4 3037 6938 0 > 2 97 1 0 > 0 0 140 244092 3314424 432776 0 0 0 4 2980 6759 0 > 2 98 1 0 > 0 0 140 244216 3314424 432776 0 0 0 16 3574 8966 0 > 3 94 3 0 > 0 0 140 244092 3314424 432776 0 0 0 4 3511 8639 1 > 2 97 1 0 > 0 1 140 244092 3314424 432776 0 0 0 36 3549 8871 0 > 2 97 1 0 > 0 0 140 244092 3314424 432776 0 0 0 4 3085 7304 0 > 2 97 1 0 > 0 0 140 243968 3314424 432776 0 0 0 20 3199 7566 0 > 2 97 1 0 > 0 0 140 244092 3314424 432776 0 0 0 16 3294 7950 0 > 2 95 3 0 > 0 0 140 244092 3314424 432776 0 0 0 4 3336 8301 0 > 2 97 1 0/ > > and iostat -m 1 5 > > Linux 2.6.18-92.1.17.el5_lustre.1.8.0custom (MDS1) 02/02/2010 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.17 0.02 1.53 1.33 0.00 96.96 > > Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 3.66 0.00 0.02 12304 79721 > drbd1 6.43 0.00 0.02 10709 70302 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.75 0.00 2.24 0.75 0.00 96.26 > > Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 1.00 0.00 0.00 0 0 > drbd1 1.00 0.00 0.00 0 0 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.00 0.00 1.75 1.00 0.00 97.24 > > Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 4.00 0.00 0.05 0 0 > drbd1 1.00 0.00 0.00 0 0 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.00 0.00 2.00 3.50 0.00 94.50 > > Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 3.00 0.00 0.02 0 0 > drbd1 4.00 0.00 0.02 0 0 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.00 0.00 2.49 0.75 0.00 96.76 > > Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 1.00 0.00 0.00 0 0 > drbd1 1.00 0.00 0.00 0 0 > > I don't think our mds is too busy ( do correct me if i have a wrong > comment on our own situation, plz ) > > Do you have any ideas or comment > > Many many thanks > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
