Hi Arden, Are you obtaining more than 100 MB/sec from one client to one OST? Given that you are using 802.3ad link aggregation, it will determine the physical NIC by the other party's MAC address. So having multiple OST and multiple clients will improve the chances of using more than one NIC of the bonding.
What is the maximum performance you obtain on the client with two 1GbE? jeff ________________________________ From: [email protected] [mailto:[email protected]] On Behalf Of Arden Wiebe Sent: Sunday, January 25, 2009 12:08 AM To: [email protected] Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0 So if one OST gets 200MiB/s and another OST gets 200MiB/s does that make 400 MiB/s or this is not how to calculate throughput? I will eventually plug the right sequence into iozone to measure it. >From my perspective it looks like ioio.ca/ioio.jpg ioio.ca/lustreone.png >ioio.ca/lustretwo.png ioio.ca/lustrethree.png ioio.ca/lustrefour.png --- On Sat, 1/24/09, Arden Wiebe <[email protected]> wrote: From: Arden Wiebe <[email protected]> Subject: [Lustre-discuss] Plateau around 200MiB/s bond0 To: [email protected] Date: Saturday, January 24, 2009, 6:04 PM 1-2948-SFP Plus Baseline 3Com Switch 1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1 1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1 2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6 1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5) 1-CLIENT bond0(eth0,eth1) 1-CLIENT eth0 1-CLIENT eth0 I fail so far creating external journal for MDT, MGS and OSSx2. How to add the external journal to /etc/fstab specifically the output of e2label /dev/sdb followed by what options for fstab? [r...@lustreone ~]# cat /proc/fs/lustre/devices 0 UP mgs MGS MGS 17 1 UP mgc mgc192.168....@tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5 2 UP lov ioio-clilov-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 4 3 UP mdc ioio-MDT0000-mdc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5 4 UP osc ioio-OST0000-osc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5 5 UP osc ioio-OST0001-osc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5 [r...@lustreone ~]# lfs df -h UUID bytes Used Available Use% Mounted on ioio-MDT0000_UUID 815.0G 534.0M 767.9G 0% /mnt/ioio[MDT:0] ioio-OST0000_UUID 3.6T 28.4G 3.4T 0% /mnt/ioio[OST:0] ioio-OST0001_UUID 3.6T 18.0G 3.4T 0% /mnt/ioio[OST:1] filesystem summary: 7.2T 46.4G 6.8T 0% /mnt/ioio [r...@lustreone ~]# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer2 (0) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 802.3ad info LACP rate: slow Active Aggregator Info: Aggregator ID: 1 Number of ports: 1 Actor Key: 17 Partner Key: 1 Partner Mac Address: 00:00:00:00:00:00 Slave Interface: eth0 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:1b:21:28:77:db Aggregator ID: 1 Slave Interface: eth1 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:1b:21:28:77:6c Aggregator ID: 2 Slave Interface: eth3 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:94 Aggregator ID: 3 Slave Interface: eth2 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:93 Aggregator ID: 4 Slave Interface: eth4 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:95 Aggregator ID: 5 Slave Interface: eth5 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:96 Aggregator ID: 6 [r...@lustreone ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb[0] sdc[1] 976762496 blocks [2/2] [UU] unused devices: <none> [r...@lustreone ~]# cat /etc/fstab LABEL=/ / ext3 defaults 1 1 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 LABEL=MGS /mnt/mgs lustre defaults,_netdev 0 0 192.168....@tcp0:/ioio /mnt/ioio lustre defaults,_netdev,noauto 0 0 [r...@lustreone ~]# ifconfig bond0 Link encap:Ethernet HWaddr 00:1B:21:28:77:DB inet addr:192.168.0.7 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:9000 Metric:1 RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0 TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:12376680079 (11.5 GiB) TX bytes:34438742885 (32.0 GiB) eth0 Link encap:Ethernet HWaddr 00:1B:21:28:77:DB inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link UP BROADCAST RUNNING SLAVE MULTICAST MTU:9000 Metric:1 RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0 TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:12290700380 (11.4 GiB) TX bytes:34438581771 (32.0 GiB) Base address:0xec00 Memory:febe0000-fec00000 >From what I have read not having an external journal configured for the OST's >is a sure recipie for slowness which I would rather not have considering the >goal is around 350MiB/s or more which should be obtainable. Here is how I formated the raid6 device on both OSS's that have identical [r...@lustrefour ~]# fdisk -l Disk /dev/sda: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 121601 976760001 83 Linux Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdb doesn't contain a valid partition table Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdc doesn't contain a valid partition table Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdd doesn't contain a valid partition table Disk /dev/sde: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sde doesn't contain a valid partition table Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdf doesn't contain a valid partition table Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdg doesn't contain a valid partition table Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdh doesn't contain a valid partition table Disk /dev/md0: 4000.8 GB, 4000819183616 bytes 2 heads, 4 sectors/track, 976762496 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md0 doesn't contain a valid partition table [r...@lustrefour ~]# [r...@lustrefour ~]# mdadm --create --assume-clean /dev/md0 --level=6 --chunk=128 --raid-devices=6 /dev/sd[cdefgh] [r...@lustrefour ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdc[0] sdh[5] sdg[4] sdf[3] sde[2] sdd[1] 3907049984 blocks level 6, 128k chunk, algorithm 2 [6/6] [UUUUUU] in: 16674 reads, 16217479 writes; out: 3022788 reads, 32865192 writes 7712698 in raid5d, 8264 out of stripes, 25661224 handle called reads: 0 for rmw, 1710975 for rcw. zcopy writes: 4864584, copied writes: 16115932 0 delayed, 0 bit delayed, 0 active, queues: 0 in, 0 out 0 expanding overlap unused devices: <none> Followed with: [r...@lustrefour ~]# mkfs.lustre --ost --fsname=ioio --mgsnode=192.168....@tcp0 --mkfsoptions="-J device=/dev/sdb1" --reformat /dev/md0 [r...@lustrefour ~]# mke2fs -b 4096 -O journal_dev /dev/sdb1 But that is hard to reassemble on the reboot or at least was before I use e2label and label things right. Question how to label the external journal in fstab if at all? Right now only running [r...@lustrefour ~]# mkfs.lustre --fsname=ioio --ost --mgsnode=192.168....@tcp0 --reformat /dev/md0 So just raid6 no external journal. [r...@lustrefour ~]# cat /etc/fstab LABEL=/ / ext3 defaults 1 1 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 LABEL=ioio-OST0001 /mnt/ost00 lustre defaults,_netdev 0 0 192.168....@tcp0:/ioio /mnt/ioio lustre defaults,_netdev,noauto 0 0 [r...@lustrefour ~]# [r...@lustreone bin]# ./ost-survey -s 4096 /mnt/ioio ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168....@tcp Number of Active OST devices : 2 Worst Read OST indx: 0 speed: 38.789337 Best Read OST indx: 1 speed: 40.017201 Read Average: 39.403269 +/- 0.613932 MB/s Worst Write OST indx: 0 speed: 49.227064 Best Write OST indx: 1 speed: 78.673564 Write Average: 63.950314 +/- 14.723250 MB/s Ost# Read(MB/s) Write(MB/s) Read-time Write-time ---------------------------------------------------- 0 38.789 49.227 105.596 83.206 1 40.017 78.674 102.356 52.063 [r...@lustreone bin]# ./ost-survey -s 1024 /mnt/ioio ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168....@tcp Number of Active OST devices : 2 Worst Read OST indx: 0 speed: 38.559620 Best Read OST indx: 1 speed: 40.053787 Read Average: 39.306704 +/- 0.747083 MB/s Worst Write OST indx: 0 speed: 71.623744 Best Write OST indx: 1 speed: 82.764897 Write Average: 77.194320 +/- 5.570577 MB/s Ost# Read(MB/s) Write(MB/s) Read-time Write-time ---------------------------------------------------- 0 38.560 71.624 26.556 14.297 1 40.054 82.765 25.566 12.372 [r...@lustreone bin]# dd of=/mnt/ioio/bigfileMGS if=/dev/zero bs=1048576 3536+0 records in 3536+0 records out 3707764736 bytes (3.7 GB) copied, 38.4775 seconds, 96.4 MB/s lustreonetwothreefour all have the same for modprobe.conf [r...@lustrefour ~]# cat /etc/modprobe.conf alias eth0 e1000 alias eth1 e1000 alias scsi_hostadapter pata_marvell alias scsi_hostadapter1 ata_piix options lnet networks=tcp alias eth2 sky2 alias eth3 sky2 alias eth4 sky2 alias eth5 sky2 alias bond0 bonding options bonding miimon=100 mode=4 [r...@lustrefour ~]# When do the same from all clients I can watch ./usr/bin/gnome-system-monitor and the send and recieve from the various nodes reaches a 209 MiB/s plateau? Uggh -----Inline Attachment Follows----- _______________________________________________ Lustre-discuss mailing list [email protected]</mc/[email protected]> http://lists.lustre.org/mailman/listinfo/lustre-discuss
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
