Arden, we also use dual channel gigE (bond0) and in my tests found that this works best:
options bonding miimon=100 mode=802.3ad xmit_hash_policy=layer3+4 This allows us to get roughly 250 MB/s transfers. Here is the iozone command I used: iozone -t1 -i0 -il -r4m -s2g You will not get anymore performance unless you move to Infiniband or another interconnect. Jeffrey Alan Bennett wrote: > Hi Arden, > > Are you obtaining more than 100 MB/sec from one client to one OST? Given > that you are using 802.3ad link aggregation, it will determine the > physical NIC by the other party's MAC address. So having multiple OST and > multiple clients will improve the chances of using more than one NIC of > the bonding. > > What is the maximum performance you obtain on the client with two 1GbE? > > jeff > > > > > ________________________________ > From: [email protected] > [mailto:[email protected]] On Behalf Of Arden Wiebe > Sent: Sunday, January 25, 2009 12:08 AM > To: [email protected] > Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0 > > So if one OST gets 200MiB/s and another OST gets 200MiB/s does that make > 400 MiB/s or this is not how to calculate throughput? I will eventually > plug the right sequence into iozone to measure it. > >>From my perspective it looks like ioio.ca/ioio.jpg ioio.ca/lustreone.png >> ioio.ca/lustretwo.png ioio.ca/lustrethree.png ioio.ca/lustrefour.png > > --- On Sat, 1/24/09, Arden Wiebe <[email protected]> wrote: > > From: Arden Wiebe <[email protected]> > Subject: [Lustre-discuss] Plateau around 200MiB/s bond0 > To: [email protected] > Date: Saturday, January 24, 2009, 6:04 PM > > 1-2948-SFP Plus Baseline 3Com Switch > 1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1 > 1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1 > 2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6 > 1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5) > 1-CLIENT bond0(eth0,eth1) > 1-CLIENT eth0 > 1-CLIENT eth0 > > I fail so far creating external journal for MDT, MGS and OSSx2. How to > add the external journal to /etc/fstab specifically the output of e2label > /dev/sdb followed by what options for fstab? > > [r...@lustreone ~]# cat /proc/fs/lustre/devices > 0 UP mgs MGS MGS 17 > 1 UP mgc mgc192.168....@tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5 > 2 UP lov ioio-clilov-ffff810209363c00 > 7307490a-4a12-4e8c-56ea-448e030a82e4 4 > 3 UP mdc ioio-MDT0000-mdc-ffff810209363c00 > 7307490a-4a12-4e8c-56ea-448e030a82e4 5 > 4 UP osc ioio-OST0000-osc-ffff810209363c00 > 7307490a-4a12-4e8c-56ea-448e030a82e4 5 > 5 UP osc ioio-OST0001-osc-ffff810209363c00 > 7307490a-4a12-4e8c-56ea-448e030a82e4 5 > [r...@lustreone ~]# lfs df -h > UUID bytes Used Available Use% Mounted on > ioio-MDT0000_UUID 815.0G 534.0M 767.9G 0% /mnt/ioio[MDT:0] > ioio-OST0000_UUID 3.6T 28.4G 3.4T 0% /mnt/ioio[OST:0] > ioio-OST0001_UUID 3.6T 18.0G 3.4T 0% /mnt/ioio[OST:1] > > filesystem summary: 7.2T 46.4G 6.8T 0% /mnt/ioio > > [r...@lustreone ~]# cat /proc/net/bonding/bond0 > Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) > > Bonding Mode: IEEE 802.3ad Dynamic link aggregation > Transmit Hash Policy: layer2 (0) > MII Status: up > MII Polling Interval (ms): 100 > Up Delay (ms): 0 > Down Delay (ms): 0 > > 802.3ad info > LACP rate: slow > Active Aggregator Info: > Aggregator ID: 1 > Number of ports: 1 > Actor Key: 17 > Partner Key: 1 > Partner Mac Address: 00:00:00:00:00:00 > > Slave Interface: eth0 > MII Status: up > Link Failure Count: 1 > Permanent HW addr: 00:1b:21:28:77:db > Aggregator ID: 1 > > Slave Interface: eth1 > MII Status: up > Link Failure Count: 1 > Permanent HW addr: 00:1b:21:28:77:6c > Aggregator ID: 2 > > Slave Interface: eth3 > MII Status: up > Link Failure Count: 0 > Permanent HW addr: 00:22:15:06:3a:94 > Aggregator ID: 3 > > Slave Interface: eth2 > MII Status: up > Link Failure Count: 0 > Permanent HW addr: 00:22:15:06:3a:93 > Aggregator ID: 4 > > Slave Interface: eth4 > MII Status: up > Link Failure Count: 0 > Permanent HW addr: 00:22:15:06:3a:95 > Aggregator ID: 5 > > Slave Interface: eth5 > MII Status: up > Link Failure Count: 0 > Permanent HW addr: 00:22:15:06:3a:96 > Aggregator ID: 6 > [r...@lustreone ~]# cat /proc/mdstat > Personalities : [raid1] > md0 : active raid1 sdb[0] sdc[1] > 976762496 blocks [2/2] [UU] > > unused devices: <none> > [r...@lustreone ~]# cat /etc/fstab > LABEL=/ / ext3 defaults 1 > 1 > tmpfs /dev/shm tmpfs defaults 0 > 0 > devpts /dev/pts devpts gid=5,mode=620 0 > 0 > sysfs /sys sysfs defaults 0 > 0 > proc /proc proc defaults 0 > 0 > LABEL=MGS /mnt/mgs lustre defaults,_netdev 0 > 0 > 192.168....@tcp0:/ioio /mnt/ioio lustre > defaults,_netdev,noauto 0 0 > > [r...@lustreone ~]# ifconfig > bond0 Link encap:Ethernet HWaddr 00:1B:21:28:77:DB > inet addr:192.168.0.7 Bcast:192.168.0.255 Mask:255.255.255.0 > inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link > UP BROADCAST RUNNING MASTER MULTICAST MTU:9000 Metric:1 > RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0 > TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:12376680079 (11.5 GiB) TX bytes:34438742885 (32.0 GiB) > > eth0 Link encap:Ethernet HWaddr 00:1B:21:28:77:DB > inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link > UP BROADCAST RUNNING SLAVE MULTICAST MTU:9000 Metric:1 > RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0 > TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:12290700380 (11.4 GiB) TX bytes:34438581771 (32.0 GiB) > Base address:0xec00 Memory:febe0000-fec00000 > >>From what I have read not having an external journal configured for the >> OST's is a sure recipie for slowness which I would rather not have >> considering the goal is around 350MiB/s or more which should be >> obtainable. > > Here is how I formated the raid6 device on both OSS's that have identical > [r...@lustrefour ~]# fdisk -l > > Disk /dev/sda: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Device Boot Start End Blocks Id System > /dev/sda1 * 1 121601 976760001 83 Linux > > Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdb doesn't contain a valid partition table > > Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdc doesn't contain a valid partition table > > Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdd doesn't contain a valid partition table > > Disk /dev/sde: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sde doesn't contain a valid partition table > > Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdf doesn't contain a valid partition table > > Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdg doesn't contain a valid partition table > > Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdh doesn't contain a valid partition table > > Disk /dev/md0: 4000.8 GB, 4000819183616 bytes > 2 heads, 4 sectors/track, 976762496 cylinders > Units = cylinders of 8 * 512 = 4096 bytes > > Disk /dev/md0 doesn't contain a valid partition table > [r...@lustrefour ~]# > > [r...@lustrefour ~]# mdadm --create --assume-clean /dev/md0 --level=6 > --chunk=128 --raid-devices=6 /dev/sd[cdefgh] > [r...@lustrefour ~]# cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] > md0 : active raid6 sdc[0] sdh[5] sdg[4] sdf[3] sde[2] sdd[1] > 3907049984 blocks level 6, 128k chunk, algorithm 2 [6/6] [UUUUUU] > in: 16674 reads, 16217479 writes; out: 3022788 reads, > 32865192 writes > 7712698 in raid5d, 8264 out of stripes, 25661224 handle > called > reads: 0 for rmw, 1710975 for rcw. zcopy writes: 4864584, > copied writes: 16115932 > 0 delayed, 0 bit delayed, 0 active, queues: 0 in, 0 out > 0 expanding overlap > > > unused devices: <none> > > Followed with: > > [r...@lustrefour ~]# mkfs.lustre --ost --fsname=ioio > --mgsnode=192.168....@tcp0 --mkfsoptions="-J device=/dev/sdb1" --reformat > /dev/md0 > > [r...@lustrefour ~]# mke2fs -b 4096 -O journal_dev /dev/sdb1 > > But that is hard to reassemble on the reboot or at least was before I use > e2label and label things right. Question how to label the external > journal in fstab if at all? Right now only running > > [r...@lustrefour ~]# mkfs.lustre --fsname=ioio --ost > --mgsnode=192.168....@tcp0 --reformat /dev/md0 > > So just raid6 no external journal. > > [r...@lustrefour ~]# cat /etc/fstab > LABEL=/ / ext3 defaults 1 > 1 > tmpfs /dev/shm tmpfs defaults 0 > 0 > devpts /dev/pts devpts gid=5,mode=620 0 > 0 > sysfs /sys sysfs defaults 0 > 0 > proc /proc proc defaults 0 > 0 > LABEL=ioio-OST0001 /mnt/ost00 lustre defaults,_netdev 0 > 0 > 192.168....@tcp0:/ioio /mnt/ioio lustre > defaults,_netdev,noauto 0 0 > > [r...@lustrefour ~]# > > > [r...@lustreone bin]# ./ost-survey -s 4096 /mnt/ioio > ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168....@tcp > Number of Active OST devices : 2 > Worst Read OST indx: 0 speed: 38.789337 > Best Read OST indx: 1 speed: 40.017201 > Read Average: 39.403269 +/- 0.613932 MB/s > Worst Write OST indx: 0 speed: 49.227064 > Best Write OST indx: 1 speed: 78.673564 > Write Average: 63.950314 +/- 14.723250 MB/s > Ost# Read(MB/s) Write(MB/s) Read-time Write-time > ---------------------------------------------------- > 0 38.789 49.227 105.596 83.206 > 1 40.017 78.674 102.356 52.063 > [r...@lustreone bin]# ./ost-survey -s 1024 /mnt/ioio > ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168....@tcp > Number of Active OST devices : 2 > Worst Read OST indx: 0 speed: 38.559620 > Best Read OST indx: 1 speed: 40.053787 > Read Average: 39.306704 +/- 0.747083 MB/s > Worst Write OST indx: 0 speed: 71.623744 > Best Write OST indx: 1 speed: 82.764897 > Write Average: 77.194320 +/- 5.570577 MB/s > Ost# Read(MB/s) Write(MB/s) Read-time Write-time > ---------------------------------------------------- > 0 38.560 71.624 26.556 14.297 > 1 40.054 82.765 25.566 12.372 > [r...@lustreone bin]# dd of=/mnt/ioio/bigfileMGS if=/dev/zero bs=1048576 > 3536+0 records in > 3536+0 records out > 3707764736 bytes (3.7 GB) copied, 38.4775 seconds, 96.4 MB/s > > lustreonetwothreefour all have the same for modprobe.conf > > [r...@lustrefour ~]# cat /etc/modprobe.conf > alias eth0 e1000 > alias eth1 e1000 > alias scsi_hostadapter pata_marvell > alias scsi_hostadapter1 ata_piix > options lnet networks=tcp > alias eth2 sky2 > alias eth3 sky2 > alias eth4 sky2 > alias eth5 sky2 > alias bond0 bonding > options bonding miimon=100 mode=4 > [r...@lustrefour ~]# > > When do the same from all clients I can watch > ./usr/bin/gnome-system-monitor and the send and recieve from the various > nodes reaches a 209 MiB/s plateau? Uggh > > > > -----Inline Attachment Follows----- > > _______________________________________________ > Lustre-discuss mailing list > [email protected]</mc/[email protected]> > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > -- Jeremy Mann [email protected] University of Texas Health Science Center Bioinformatics Core Facility http://www.bioinformatics.uthscsa.edu Phone: (210) 567-2672 _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
