On Thu, 24 Apr 2014 13:51:49 +0800 Indra Pramana wrote: > Hi Christian, > > Good day to you, and thank you for your reply. > > On Wed, Apr 23, 2014 at 11:41 PM, Christian Balzer <[email protected]> wrote: > > > > > > Using 32 concurrent writes, result is below. The speed really > > > > > fluctuates. > > > > > > > > > > Total time run: 64.31704964.317049 > > > > > Total writes made: 1095 > > > > > Write size: 4194304 > > > > > Bandwidth (MB/sec): 68.100 > > > > > > > > > > Stddev Bandwidth: 44.6773 > > > > > Max bandwidth (MB/sec): 184 > > > > > Min bandwidth (MB/sec): 0 > > > > > Average Latency: 1.87761 > > > > > Stddev Latency: 1.90906 > > > > > Max latency: 9.99347 > > > > > Min latency: 0.075849 > > > > > > > > > That is really weird, it should get faster, not slower. ^o^ > > > > I assume you've run this a number of times? > > > > > > > > Also my apologies, the default is 16 threads, not 1, but that still > > > > isn't enough to get my cluster to full speed: > > > > --- > > > > Bandwidth (MB/sec): 349.044 > > > > > > > > Stddev Bandwidth: 107.582 > > > > Max bandwidth (MB/sec): 408 > > > > --- > > > > at 64 threads it will ramp up from a slow start to: > > > > --- > > > > Bandwidth (MB/sec): 406.967 > > > > > > > > Stddev Bandwidth: 114.015 > > > > Max bandwidth (MB/sec): 452 > > > > --- > > > > > > > > But what stands out is your latency. I don't have a 10GBE network > > > > to compare, but my Infiniband based cluster (going through at > > > > least one switch) gives me values like this: > > > > --- > > > > Average Latency: 0.335519 > > > > Stddev Latency: 0.177663 > > > > Max latency: 1.37517 > > > > Min latency: 0.1017 > > > > --- > > > > > > > > Of course that latency is not just the network. > > > > > > > > > > What else can contribute to this latency? Storage node load, disk > > > speed, anything else? > > > > > That and the network itself are pretty much it, you should know once > > you've run those test with atop or iostat on the storage nodes. > > > > > > > > > I would suggest running atop (gives you more information at one > > > > glance) or "iostat -x 3" on all your storage nodes during these > > > > tests to identify any node or OSD that is overloaded in some way. > > > > > > > > > > Will try. > > > > > Do that and let us know about the results. > > > > I have done some tests using iostat and noted some OSDs on a particular > storage node going up to the 100% limit when I run the rados bench test. > Dumping lots of text will make people skip over your mails, you need to summarize and preferably understand yourself what these numbers mean.
The iostat output is not too conclusive, as the numbers when reaching 100% utilization are not particular impressive. The fact that it happens though should make you look for anything different with these OSDs, from smartctl checks to PG distribution, as in "ceph pg dump" and then tallying up each PG. Also look at "ceph osd tree" and see if those OSDs or node have a higher weight than others. The atop line indicates that sdb was being read at a rate of 100MB/s and assuming that your benchmark was more or less the only thing running at that time this would mean something very odd is going on, as all the other OSDs were have no significant reads going on and all were being written at about the same speed. Christian > ==== > avg-cpu: %user %nice %system %iowait %steal %idle > 1.09 0.00 0.92 21.74 0.00 76.25 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > sda 0.00 0.00 4.33 42.00 73.33 6980.00 > 304.46 0.29 6.22 0.00 6.86 1.50 6.93 > sdb 0.00 0.00 0.00 17.67 0.00 6344.00 > 718.19 59.64 854.26 0.00 854.26 56.60 *100.00* > sdc 0.00 0.00 12.33 59.33 70.67 18882.33 > 528.92 36.54 509.80 64.76 602.31 10.51 75.33 > sdd 0.00 0.00 3.33 54.33 24.00 15249.17 > 529.71 1.29 22.45 3.20 23.63 1.64 9.47 > sde 0.00 0.33 0.00 0.67 0.00 4.00 > 12.00 0.30 450.00 0.00 450.00 450.00 30.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 1.38 0.00 1.13 7.75 0.00 89.74 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > sda 0.00 0.00 5.00 69.00 30.67 19408.50 > 525.38 4.29 58.02 0.53 62.18 2.00 14.80 > sdb 0.00 0.00 7.00 63.33 41.33 20911.50 > 595.82 13.09 826.96 88.57 908.57 5.48 38.53 > sdc 0.00 0.00 2.67 30.00 17.33 6945.33 > 426.29 0.21 6.53 0.50 7.07 1.59 5.20 > sdd 0.00 0.00 2.67 58.67 16.00 20661.33 > 674.26 4.89 79.54 41.00 81.30 2.70 16.53 > sde 0.00 0.00 0.00 1.67 0.00 6.67 > 8.00 0.01 3.20 0.00 3.20 1.60 0.27 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.97 0.00 0.55 6.73 0.00 91.75 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > sda 0.00 0.00 1.67 15.33 21.33 120.00 > 16.63 0.02 1.18 0.00 1.30 0.63 1.07 > sdb 0.00 0.00 4.33 62.33 24.00 13299.17 > 399.69 2.68 11.18 1.23 11.87 1.94 12.93 > sdc 0.00 0.00 0.67 38.33 70.67 7881.33 > 407.79 37.66 202.15 0.00 205.67 13.61 53.07 > sdd 0.00 0.00 3.00 17.33 12.00 166.00 > 17.51 0.05 2.89 3.11 2.85 0.98 2.00 > sde 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 1.29 0.00 0.92 24.10 0.00 73.68 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > sda 0.00 0.00 0.00 45.33 0.00 4392.50 > 193.79 0.62 13.62 0.00 13.62 1.09 4.93 > sdb 0.00 0.00 0.00 8.67 0.00 3600.00 > 830.77 63.87 1605.54 0.00 1605.54 115.38 *100.00* > sdc 0.00 0.33 8.67 42.67 37.33 5672.33 > 222.45 16.88 908.78 1.38 1093.09 7.06 36.27 > sdd 0.00 0.00 0.33 31.00 1.33 629.83 > 40.29 0.06 1.91 0.00 1.94 0.94 2.93 > sde 0.00 0.00 0.00 0.33 0.00 1.33 > 8.00 0.12 368.00 0.00 368.00 368.00 12.27 > > avg-cpu: %user %nice %system %iowait %steal %idle > 1.59 0.00 0.88 4.82 0.00 92.70 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > sda 0.00 0.00 0.00 29.00 0.00 235.00 > 16.21 0.06 1.98 0.00 1.98 0.97 2.80 > sdb 0.00 6.00 4.33 114.67 38.67 6422.33 > 108.59 9.19 513.19 265.23 522.56 2.08 24.80 > sdc 0.00 0.00 0.00 20.67 0.00 124.00 > 12.00 0.04 2.00 0.00 2.00 1.03 2.13 > sdd 0.00 5.00 1.67 81.00 12.00 546.17 > 13.50 0.10 1.21 0.80 1.22 0.39 3.20 > sde 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > ==== > > And the high utilisation is randomly affecting other OSDs as well within > the same node, and not only affecting one particular OSD. > > atop result on the node: > > ==== > ATOP - > ceph-osd-07 > 2014/04/24 > 13:49:12 > ------ 10s > elapsed > PRC | sys 1.77s | user 2.11s | | | > #proc 164 | | #trun 2 | #tslpi 2817 | #tslpu > 0 | | #zombie 0 | clones 4 | > | | #exit 0 | > CPU | sys 14% | user 20% | | irq 1% > | | | idle 632% | wait 133% > | | | steal 0% | guest 0% > | | avgf 1.79GHz | avgscal 54% | > cpu | sys 6% | user 7% | | irq 0% > | | | idle 19% | cpu006 w 68% > | | | steal 0% | guest 0% > | | avgf 2.42GHz | avgscal 73% | > cpu | sys 2% | user 3% | | irq 0% > | | | idle 88% | cpu002 w 7% > | | | steal 0% | guest 0% > | | avgf 1.68GHz | avgscal 50% | > cpu | sys 2% | user 2% | | irq 0% > | | | idle 86% | cpu003 w 10% > | | | steal 0% | guest 0% > | | avgf 1.67GHz | avgscal 50% | > cpu | sys 2% | user 2% | | irq 0% > | | | idle 75% | cpu001 w 21% > | | | steal 0% | guest 0% > | | avgf 1.83GHz | avgscal 55% | > cpu | sys 1% | user 2% | | irq 1% > | | | idle 70% | cpu000 w 26% > | | | steal 0% | guest 0% > | | avgf 1.85GHz | avgscal 56% | > cpu | sys 1% | user 2% | | irq 0% > | | | idle 97% | cpu004 w 1% > | | | steal 0% | guest 0% > | | avgf 1.64GHz | avgscal 49% | > cpu | sys 1% | user 1% | | irq 0% > | | | idle 98% | cpu005 w 0% > | | | steal 0% | guest 0% > | | avgf 1.60GHz | avgscal 48% | > cpu | sys 0% | user 1% | | irq 0% > | | | idle 98% | cpu007 w 0% > | | | steal 0% | guest 0% > | | avgf 1.60GHz | avgscal 48% | > CPL | avg1 1.12 | | avg5 0.90 | | avg15 > 0.72 | | | | csw 103682 > | | intr 34330 | | > | | numcpu 8 | > MEM | tot 15.6G | | free 158.2M | cache 13.7G > | | dirty 101.4M | buff 18.2M | | slab > 574.6M | | | | > | | | > SWP | tot 518.0M | | free 489.6M | > | | | | > | | | | | vmcom > 5.2G | | vmlim 8.3G | > PAG | scan 327450 | | | stall 0 > | | | | > | | | swin 0 | > | | | swout 0 | > DSK | sdb | | busy 90% | read 8115 > | | write 695 | KiB/r 130 | | KiB/w > 194 | MBr/s 103.34 | | MBw/s 13.22 | avq 4.61 > | | avio 1.01 ms | > DSK | sdc | | busy 32% | read 23 > | | write 431 | KiB/r 6 | | KiB/w > 318 | MBr/s 0.02 | | MBw/s 13.41 | avq 34.86 > | | avio 6.95 ms | > DSK | sda | | busy 32% | read 25 > | | write 674 | KiB/r 6 | | KiB/w > 193 | MBr/s 0.02 | | MBw/s 12.76 | avq 41.00 > | | avio 4.48 ms | > DSK | sdd | | busy 7% | read 26 > | | write 473 | KiB/r 7 | | KiB/w > 223 | MBr/s 0.02 | | MBw/s 10.31 | avq 14.29 > | | avio 1.45 ms | > DSK | sde | | busy 2% | read 0 > | | write 5 | KiB/r 0 | | KiB/w > 5 | MBr/s 0.00 | | MBw/s 0.00 | avq 1.00 > | | avio 44.8 ms | > NET | transport | tcpi 21326 | | tcpo 27479 | > udpi 0 | udpo 0 | tcpao 0 | | tcppo > 2 | tcprs 3 | tcpie 0 | tcpor 0 | | udpnp > 0 | udpip 0 | > NET | network | | ipi 21326 | ipo 14340 > | | ipfrw 0 | deliv 21326 | > | | | | | icmpi > 0 | | icmpo 0 | > NET | p2p2 ---- | pcki 12659 | | pcko 20931 | si > 124 Mbps | | so 107 Mbps | coll 0 | mlti 0 > | | erri 0 | erro 0 | | drpi > 0 | drpo 0 | > NET | p2p1 ---- | pcki 8565 | | pcko 6443 | si > 106 Mbps | | so 7911 Kbps | coll 0 | mlti 0 > | | erri 0 | erro 0 | | drpi > 0 | drpo 0 | > NET | lo ---- | pcki 108 | | pcko 108 | > si 8 Kbps | | so 8 Kbps | coll 0 | mlti > 0 | | erri 0 | erro 0 | | drpi > 0 | drpo 0 | > > PID RUID EUID THR > SYSCPU USRCPU VGROW RGROW > RDDSK WRDSK ST EXC S > CPUNR CPU CMD 1/1 > 6881 root root 538 > 0.74s 0.94s 0K 256K > 1.0G 121.3M -- - S > 3 17% ceph-osd > 28708 root root 720 > 0.30s 0.69s 512K -8K > 160K 157.7M -- - S > 3 10% ceph-osd > 31569 root root 678 > 0.21s 0.30s 512K -584K > 156K 162.7M -- - S > 0 5% ceph-osd > 32095 root root 654 > 0.14s 0.16s 0K 0K > 60K 105.9M -- - S > 0 3% ceph-osd > 61 root root 1 > 0.20s 0.00s 0K 0K > 0K 0K -- - S > 3 2% kswapd0 > 10584 root root 1 > 0.03s 0.02s 112K 112K > 0K 0K -- - R > 4 1% atop > 11618 root root 1 > 0.03s 0.00s 0K 0K > 0K 0K -- - S > 6 0% kworker/6:2 > 10 root root 1 > 0.02s 0.00s 0K 0K > 0K 0K -- - S > 0 0% rcu_sched > 38 root root 1 > 0.01s 0.00s 0K 0K > 0K 0K -- - S > 6 0% ksoftirqd/6 > 1623 root root 1 > 0.01s 0.00s 0K 0K > 0K 0K -- - S > 6 0% kworker/6:1H > 1993 root root 1 > 0.01s 0.00s 0K 0K > 0K 0K -- - S > 2 0% flush-8:48 > 2031 root root 1 > 0.01s 0.00s 0K 0K > 0K 0K -- - S > 2 0% flush-8:0 > 2032 root root 1 > 0.01s 0.00s 0K 0K > 0K 0K -- - S > 0 0% flush-8:16 > 2033 root root 1 > 0.01s 0.00s 0K 0K > 0K 0K -- - S > 2 0% flush-8:32 > 5787 root root 1 > 0.01s 0.00s 0K 0K > 4K 0K -- - S > 3 0% kworker/3:0 > 27605 root root 1 > 0.01s 0.00s 0K 0K > 0K 0K -- - S > 1 0% kworker/1:2 > 27823 root root 1 > 0.01s 0.00s 0K 0K > 0K 0K -- - S > 0 0% kworker/0:2 > 32511 root root 1 > 0.01s 0.00s 0K 0K > 0K 0K -- - S > 2 0% kworker/2:0 > 1536 root root 1 > 0.00s 0.00s 0K 0K > 0K 0K -- - S > 2 0% irqbalance > 478 root root 1 > 0.00s 0.00s 0K 0K > 0K 0K -- - S > 3 0% usb-storage > 494 root root 1 > 0.00s 0.00s 0K 0K > 0K 0K -- - S > 1 0% jbd2/sde1-8 > 1550 root root 1 > 0.00s 0.00s 0K 0K > 400K 0K -- - S > 1 0% xfsaild/sdb1 > 1750 root root 1 > 0.00s 0.00s 0K 0K > 128K 0K -- - S > 2 0% xfsaild/sdd1 > 1994 root root 1 > 0.00s 0.00s 0K 0K > 0K 0K -- - S > 1 0% flush-8:64 > ==== > > I have tried to trim the SSD drives but the problem seems to persist. > Last time trimming the SSD drives can help to improve the performance. > > Any advice is greatly appreciated. > > Thank you. -- Christian Balzer Network/Systems Engineer [email protected] Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
