Hi Nick, Thanks for the detailed response and insight. SSDs are indeed definitely on the to-buy list.
I will certainly try to rule out any hardware issues in the meantime. Cheers, Lincoln > On Sep 17, 2015, at 12:53 PM, Nick Fisk <[email protected]> wrote: > > It's probably helped but I fear that your overall design is not going to work > well for you. Cache Tier + Base tier + journals on the same disks is going to > really hurt. > > The problem when using cache tiering (especially with EC pools in future > releases) is that to modify a block that isn't in the cache tier you have to > promote it 1st, which often kicks another block out the cache. > > So worse case you could have for a single write > > R from EC -> W to CT + jrnl W -> W actual data to CT + jrnl W -> R from CT -> > W to EC + jrnl W > > Plus any metadata updates. Either way you looking at probably somewhere near > a 10x write amplification for 4MB writes, which will quickly overload your > disks leading to very slow performance. Smaller IO's would still cause 4MB > blocks to be shifted between pools. What makes it worse is that these > promotions/evictions tend to happen to hot PG's and not spread round the > whole cluster meaning that a single hot OSD can hold up writes across the > whole pool. > > I know it's not what you want to hear, but I can't think of anything you can > do to help in this instance unless you are willing to get some SSD journals > and maybe move the Cache pool on to separate disks or SSD's. Basically try > and limit the amount of random IO the disks have to do. > > Of course please do try and find a time to stop all IO and then run the test > on the test 3 way pool, to rule out any hardware/OS issues. > > >> -----Original Message----- >> From: ceph-users [mailto:[email protected]] On Behalf Of >> Lincoln Bryant >> Sent: 17 September 2015 18:36 >> To: Nick Fisk <[email protected]> >> Cc: [email protected] >> Subject: Re: [ceph-users] Ceph cluster NO read / write performance :: Ops >> are blocked >> >> Just a small update — the blocked ops did disappear after doubling the >> target_max_bytes. We’ll see if it sticks! I’ve thought I’ve solved this >> blocked >> ops problem about 10 times now :) >> >> Assuming this is the issue, is there any workaround for this problem (or is >> it >> working as intended)? (Should I set up a cron to run >> cache-try-flush-evict-all >> every night? :)) >> >> Another curious thing is that a rolling restart of all OSDs also seems to >> fix the >> problem — for a time. I’m not sure how that would fit in if this is the >> problem. >> >> —Lincoln >> >>> On Sep 17, 2015, at 12:07 PM, Lincoln Bryant <[email protected]> >> wrote: >>> >>> We have CephFS utilizing a cache tier + EC backend. The cache tier and ec >> pool sit on the same spinners — no SSDs. Our cache tier has a >> target_max_bytes of 5TB and the total storage is about 1PB. >>> >>> I do have a separate test pool with 3x replication and no cache tier, and I >> still see significant performance drops and blocked ops with no/minimal >> client I/O from CephFS. Right now I have 530 blocked ops with 20MB/s of >> client write I/O and no active scrubs. The rados bench on my test pool looks >> like this: >>> >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >>> 0 0 0 0 0 0 - 0 >>> 1 31 94 63 251.934 252 0.31017 0.217719 >>> 2 31 103 72 143.969 36 0.978544 0.260631 >>> 3 31 103 72 95.9815 0 - 0.260631 >>> 4 31 111 80 79.9856 16 2.29218 0.476458 >>> 5 31 112 81 64.7886 4 2.5559 0.50213 >>> 6 31 112 81 53.9905 0 - 0.50213 >>> 7 31 115 84 47.9917 6 3.71826 0.615882 >>> 8 31 115 84 41.9928 0 - 0.615882 >>> 9 31 115 84 37.327 0 - 0.615882 >>> 10 31 117 86 34.3942 2.66667 6.73678 0.794532 >>> >>> I’m really leaning more toward it being a weird controller/disk problem. >>> >>> As a test, I suppose I could double the target_max_bytes, just so the cache >> tier stops evicting while client I/O is writing? >>> >>> —Lincoln >>> >>>> On Sep 17, 2015, at 11:59 AM, Nick Fisk <[email protected]> wrote: >>>> >>>> Ah right....this is where it gets interesting. >>>> >>>> You are probably hitting a cache full on a PG somewhere which is either >> making everything wait until it flushes or something like that. >>>> >>>> What cache settings have you got set? >>>> >>>> I assume you have SSD's for the cache tier? Can you share the size of the >> pool. >>>> >>>> If possible could you also create a non tiered test pool and do some >> benchmarks on that to rule out any issue with the hardware and OSD's. >>>> >>>>> -----Original Message----- >>>>> From: ceph-users [mailto:[email protected]] On >>>>> Behalf Of Lincoln Bryant >>>>> Sent: 17 September 2015 17:54 >>>>> To: Nick Fisk <[email protected]> >>>>> Cc: [email protected] >>>>> Subject: Re: [ceph-users] Ceph cluster NO read / write performance >>>>> :: Ops are blocked >>>>> >>>>> Hi Nick, >>>>> >>>>> Thanks for responding. Yes, I am. >>>>> >>>>> —Lincoln >>>>> >>>>>> On Sep 17, 2015, at 11:53 AM, Nick Fisk <[email protected]> wrote: >>>>>> >>>>>> You are getting a fair amount of reads on the disks whilst doing >>>>>> these >>>>> writes. You're not using cache tiering are you? >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: ceph-users [mailto:[email protected]] On >>>>>>> Behalf Of Lincoln Bryant >>>>>>> Sent: 17 September 2015 17:42 >>>>>>> To: [email protected] >>>>>>> Subject: Re: [ceph-users] Ceph cluster NO read / write performance :: >>>>>>> Ops are blocked >>>>>>> >>>>>>> Hello again, >>>>>>> >>>>>>> Well, I disabled offloads on the NIC -- didn’t work for me. I also >>>>>>> tried setting net.ipv4.tcp_moderate_rcvbuf = 0 as suggested >>>>>>> elsewhere in the thread to no avail. >>>>>>> >>>>>>> Today I was watching iostat on an OSD box ('iostat -xm 5') when >>>>>>> the cluster got into “slow” state: >>>>>>> >>>>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >>>>>>> avgrq-sz >> avgqu- >>>>> sz >>>>>>> await svctm %util >>>>>>> sdb 0.00 13.57 84.23 167.47 0.45 2.78 >>>>>>> 26.26 2.06 8.18 >>>>> 3.85 >>>>>>> 96.93 >>>>>>> sdc 0.00 46.71 5.59 289.22 0.03 2.54 >>>>>>> 17.85 3.18 10.77 >>>>> 0.97 >>>>>>> 28.72 >>>>>>> sdd 0.00 16.57 45.11 91.62 0.25 0.55 >>>>>>> 12.01 0.75 5.51 >>>>> 2.45 >>>>>>> 33.47 >>>>>>> sde 0.00 13.57 6.99 143.31 0.03 2.53 >>>>>>> 34.97 1.99 13.27 >>>>> 2.12 >>>>>>> 31.86 >>>>>>> sdf 0.00 18.76 4.99 158.48 0.10 1.09 >>>>>>> 14.88 1.26 7.69 >> 1.24 >>>>>>> 20.26 >>>>>>> sdg 0.00 25.55 81.64 237.52 0.44 2.89 >>>>>>> 21.36 4.14 12.99 >>>>> 2.58 >>>>>>> 82.22 >>>>>>> sdh 0.00 89.42 16.17 492.42 0.09 3.81 >>>>>>> 15.69 17.12 >> 33.66 >>>>> 0.73 >>>>>>> 36.95 >>>>>>> sdi 0.00 20.16 17.76 189.62 0.10 1.67 >>>>>>> 17.46 3.45 16.63 >>>>> 1.57 >>>>>>> 32.55 >>>>>>> sdj 0.00 31.54 0.00 185.23 0.00 1.91 >>>>>>> 21.15 3.33 18.00 >>>>> 0.03 >>>>>>> 0.62 >>>>>>> sdk 0.00 26.15 2.40 133.33 0.01 0.84 >>>>>>> 12.79 1.07 7.87 >>>>> 0.85 >>>>>>> 11.58 >>>>>>> sdl 0.00 25.55 9.38 123.95 0.05 1.15 >>>>>>> 18.44 0.50 3.74 >> 1.58 >>>>>>> 21.10 >>>>>>> sdm 0.00 6.39 92.61 47.11 0.47 0.26 >>>>>>> 10.65 1.27 9.07 >>>>> 6.92 >>>>>>> 96.73 >>>>>>> >>>>>>> The %util is rather high on some disks, but I’m not an expert at >>>>>>> looking at iostat so I’m not sure how worrisome this is. Does >>>>>>> anything here stand out to anyone? >>>>>>> >>>>>>> At the time of that iostat, Ceph was reporting a lot of blocked >>>>>>> ops on the OSD associated with sde (as well as about 30 other >>>>>>> OSDs), but it doesn’t look all that busy. Some simple ‘dd’ tests >>>>>>> seem to indicate the >>>>> disk is fine. >>>>>>> >>>>>>> Similarly, iotop seems OK on this host: >>>>>>> >>>>>>> TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND >>>>>>> 472477 be/4 root 0.00 B/s 5.59 M/s 0.00 % 0.57 % ceph-osd >>>>>>> -i 111 >> -- >>>>> pid- >>>>>>> file /var/run/ceph/osd.111.pid -c /etc/ceph/ceph.conf --cluster ceph >>>>>>> 470621 be/4 root 0.00 B/s 10.09 M/s 0.00 % 0.40 % ceph-osd -i >> 111 -- >>>>> pid- >>>>>>> file /var/run/ceph/osd.111.pid -c /etc/ceph/ceph.conf --cluster ceph >>>>>>> 3495447 be/4 root 0.00 B/s 272.19 K/s 0.00 % 0.36 % ceph-osd >>>>>>> -i >> 114 -- >>>>>>> pid-file /var/run/ceph/osd.114.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3488389 be/4 root 0.00 B/s 596.80 K/s 0.00 % 0.16 % ceph-osd - >> i 109 -- >>>>>>> pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3488060 be/4 root 0.00 B/s 600.83 K/s 0.00 % 0.15 % ceph-osd >>>>>>> -i >> 108 -- >>>>>>> pid-file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3505573 be/4 root 0.00 B/s 528.25 K/s 0.00 % 0.10 % ceph-osd >>>>>>> -i >> 119 -- >>>>>>> pid-file /var/run/ceph/osd.119.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3495434 be/4 root 0.00 B/s 2.02 K/s 0.00 % 0.10 % ceph-osd >>>>>>> -i 114 >> -- >>>>> pid- >>>>>>> file /var/run/ceph/osd.114.pid -c /etc/ceph/ceph.conf --cluster ceph >>>>>>> 3502327 be/4 root 0.00 B/s 506.07 K/s 0.00 % 0.09 % ceph-osd >>>>>>> -i >> 118 -- >>>>>>> pid-file /var/run/ceph/osd.118.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3489100 be/4 root 0.00 B/s 106.86 K/s 0.00 % 0.09 % ceph-osd >>>>>>> -i >> 110 -- >>>>>>> pid-file /var/run/ceph/osd.110.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3496631 be/4 root 0.00 B/s 229.85 K/s 0.00 % 0.05 % ceph-osd >>>>>>> -i >> 115 -- >>>>>>> pid-file /var/run/ceph/osd.115.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3505561 be/4 root 0.00 B/s 2.02 K/s 0.00 % 0.03 % ceph-osd >>>>>>> -i >> 119 -- >>>>>>> pid-file /var/run/ceph/osd.119.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3488059 be/4 root 0.00 B/s 2.02 K/s 0.00 % 0.03 % ceph-osd >>>>>>> -i 108 >> -- >>>>> pid- >>>>>>> file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster ceph >>>>>>> 3488391 be/4 root 46.37 K/s 431.47 K/s 0.00 % 0.02 % ceph-osd >>>>>>> -i >> 109 - >>>>> - >>>>>>> pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3500639 be/4 root 0.00 B/s 221.78 K/s 0.00 % 0.02 % ceph-osd >>>>>>> -i >> 117 -- >>>>>>> pid-file /var/run/ceph/osd.117.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3488392 be/4 root 34.28 K/s 185.49 K/s 0.00 % 0.02 % ceph-osd >>>>>>> -i >> 109 - >>>>> - >>>>>>> pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3488062 be/4 root 4.03 K/s 66.54 K/s 0.00 % 0.02 % ceph-osd >>>>>>> -i >> 108 -- >>>>> pid- >>>>>>> file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster >>>>>>> ceph >>>>>>> >>>>>>> These are all 6TB seagates in single-disk RAID 0 on a PERC H730 >>>>>>> Mini controller. >>>>>>> >>>>>>> I did try removing the disk with 20k non-medium errors, but that >>>>>>> didn’t seem to help. >>>>>>> >>>>>>> Thanks for any insight! >>>>>>> >>>>>>> Cheers, >>>>>>> Lincoln Bryant >>>>>>> >>>>>>>> On Sep 9, 2015, at 1:09 PM, Lincoln Bryant >>>>>>>> <[email protected]> >>>>> wrote: >>>>>>>> >>>>>>>> Hi Jan, >>>>>>>> >>>>>>>> I’ll take a look at all of those things and report back >>>>>>>> (hopefully >>>>>>>> :)) >>>>>>>> >>>>>>>> I did try setting all of my OSDs to writethrough instead of >>>>>>>> writeback on the >>>>>>> controller, which was significantly more consistent in performance >>>>>>> (from 1100MB/s down to 300MB/s, but still occasionally dropping to >>>>>>> 0MB/s). Still plenty of blocked ops. >>>>>>>> >>>>>>>> I was wondering if not-so-nicely failing OSD(s) might be the cause. >>>>>>>> My >>>>>>> controller (PERC H730 Mini) seems frustratingly terse with SMART >>>>>>> information, but at least one disk has a “Non-medium error count” >>>>>>> of over 20,000.. >>>>>>>> >>>>>>>> I’ll try disabling offloads as well. >>>>>>>> >>>>>>>> Thanks much for the suggestions! >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Lincoln >>>>>>>> >>>>>>>>> On Sep 9, 2015, at 3:59 AM, Jan Schermer <[email protected]> >> wrote: >>>>>>>>> >>>>>>>>> Just to recapitulate - the nodes are doing "nothing" when it >>>>>>>>> drops to >>>>> zero? >>>>>>> Not flushing something to drives (iostat)? Not cleaning pagecache >>>>>>> (kswapd and similiar)? Not out of any type of memory (slab, >>>>>>> min_free_kbytes)? Not network link errors, no bad checksums (those >>>>>>> are >>>>> hard to spot, though)? >>>>>>>>> >>>>>>>>> Unless you find something I suggest you try disabling offloads >>>>>>>>> on the NICs >>>>>>> and see if the problem goes away. >>>>>>>>> >>>>>>>>> Jan >>>>>>>>> >>>>>>>>>> On 08 Sep 2015, at 18:26, Lincoln Bryant >>>>>>>>>> <[email protected]> >>>>> wrote: >>>>>>>>>> >>>>>>>>>> For whatever it’s worth, my problem has returned and is very >>>>>>>>>> similar to >>>>>>> yours. Still trying to figure out what’s going on over here. >>>>>>>>>> >>>>>>>>>> Performance is nice for a few seconds, then goes to 0. This is >>>>>>>>>> a similar setup to yours (12 OSDs per box, Scientific Linux 6, >>>>>>>>>> Ceph 0.94.3, etc) >>>>>>>>>> >>>>>>>>>> 384 16 29520 29504 307.287 1188 0.0492006 >>>>>>>>>> 0.208259 >>>>>>>>>> 385 16 29813 29797 309.532 1172 0.0469708 >>>>>>>>>> 0.206731 >>>>>>>>>> 386 16 30105 30089 311.756 1168 0.0375764 >>>>>>>>>> 0.205189 >>>>>>>>>> 387 16 30401 30385 314.009 1184 0.036142 >>>>>>>>>> 0.203791 >>>>>>>>>> 388 16 30695 30679 316.231 1176 0.0372316 >>>>>>>>>> 0.202355 >>>>>>>>>> 389 16 30987 30971 318.42 1168 0.0660476 >>>>>>>>>> 0.200962 >>>>>>>>>> 390 16 31282 31266 320.628 1180 0.0358611 >>>>>>>>>> 0.199548 >>>>>>>>>> 391 16 31568 31552 322.734 1144 0.0405166 >>>>>>>>>> 0.198132 >>>>>>>>>> 392 16 31857 31841 324.859 1156 0.0360826 >>>>>>>>>> 0.196679 >>>>>>>>>> 393 16 32090 32074 326.404 932 0.0416869 >>>>>>>>>> 0.19549 >>>>>>>>>> 394 16 32205 32189 326.743 460 0.0251877 >>>>>>>>>> 0.194896 >>>>>>>>>> 395 16 32302 32286 326.897 388 0.0280574 >>>>>>>>>> 0.194395 >>>>>>>>>> 396 16 32348 32332 326.537 184 0.0256821 >>>>>>>>>> 0.194157 >>>>>>>>>> 397 16 32385 32369 326.087 148 0.0254342 >>>>>>>>>> 0.193965 >>>>>>>>>> 398 16 32424 32408 325.659 156 0.0263006 >>>>>>>>>> 0.193763 >>>>>>>>>> 399 16 32445 32429 325.054 84 0.0233839 >>>>>>>>>> 0.193655 >>>>>>>>>> 2015-09-08 11:22:31.940164 min lat: 0.0165045 max lat: 67.6184 avg >> lat: >>>>>>> 0.193655 >>>>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >>>>>>>>>> lat >>>>>>>>>> 400 16 32445 32429 324.241 0 - >>>>>>>>>> 0.193655 >>>>>>>>>> 401 16 32445 32429 323.433 0 - >>>>>>>>>> 0.193655 >>>>>>>>>> 402 16 32445 32429 322.628 0 - >>>>>>>>>> 0.193655 >>>>>>>>>> 403 16 32445 32429 321.828 0 - >>>>>>>>>> 0.193655 >>>>>>>>>> 404 16 32445 32429 321.031 0 - >>>>>>>>>> 0.193655 >>>>>>>>>> 405 16 32445 32429 320.238 0 - >>>>>>>>>> 0.193655 >>>>>>>>>> 406 16 32445 32429 319.45 0 - >>>>>>>>>> 0.193655 >>>>>>>>>> 407 16 32445 32429 318.665 0 - >>>>>>>>>> 0.193655 >>>>>>>>>> >>>>>>>>>> needless to say, very strange. >>>>>>>>>> >>>>>>>>>> —Lincoln >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Sep 7, 2015, at 3:35 PM, Vickey Singh >>>>>>> <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> Adding ceph-users. >>>>>>>>>>> >>>>>>>>>>> On Mon, Sep 7, 2015 at 11:31 PM, Vickey Singh >>>>>>> <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Sep 7, 2015 at 10:04 PM, Udo Lembke >>>>>>> <[email protected]> wrote: >>>>>>>>>>> Hi Vickey, >>>>>>>>>>> Thanks for your time in replying to my problem. >>>>>>>>>>> >>>>>>>>>>> I had the same rados bench output after changing the >>>>>>>>>>> motherboard of >>>>>>> the monitor node with the lowest IP... >>>>>>>>>>> Due to the new mainboard, I assume the hw-clock was wrong >>>>>>>>>>> during >>>>>>> startup. Ceph health show no errors, but all VMs aren't able to do >>>>>>> IO (very high load on the VMs - but no traffic). >>>>>>>>>>> I stopped the mon, but this don't changed anything. I had to >>>>>>>>>>> restart all >>>>>>> other mons to get IO again. After that I started the first mon >>>>>>> also (with the right time now) and all worked fine again... >>>>>>>>>>> >>>>>>>>>>> Thanks i will try to restart all OSD / MONS and report back , >>>>>>>>>>> if it solves my problem >>>>>>>>>>> >>>>>>>>>>> Another posibility: >>>>>>>>>>> Do you use journal on SSDs? Perhaps the SSDs can't write to >>>>>>>>>>> garbage >>>>>>> collection? >>>>>>>>>>> >>>>>>>>>>> No i don't have journals on SSD , they are on the same OSD disk. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Udo >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 07.09.2015 16:36, Vickey Singh wrote: >>>>>>>>>>>> Dear Experts >>>>>>>>>>>> >>>>>>>>>>>> Can someone please help me , why my cluster is not able write >>>>> data. >>>>>>>>>>>> >>>>>>>>>>>> See the below output cur MB/S is 0 and Avg MB/s is >> decreasing. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Ceph Hammer 0.94.2 >>>>>>>>>>>> CentOS 6 (3.10.69-1) >>>>>>>>>>>> >>>>>>>>>>>> The Ceph status says OPS are blocked , i have tried checking >>>>>>>>>>>> , what all i know >>>>>>>>>>>> >>>>>>>>>>>> - System resources ( CPU , net, disk , memory ) -- All normal >>>>>>>>>>>> - 10G network for public and cluster network -- no >>>>>>>>>>>> saturation >>>>>>>>>>>> - Add disks are physically healthy >>>>>>>>>>>> - No messages in /var/log/messages OR dmesg >>>>>>>>>>>> - Tried restarting OSD which are blocking operation , but no >>>>>>>>>>>> luck >>>>>>>>>>>> - Tried writing through RBD and Rados bench , both are >>>>>>>>>>>> giving same problemm >>>>>>>>>>>> >>>>>>>>>>>> Please help me to fix this problem. >>>>>>>>>>>> >>>>>>>>>>>> # rados bench -p rbd 60 write Maintaining 16 concurrent >>>>>>>>>>>> writes of 4194304 bytes for up to 60 seconds or 0 objects >>>>>>>>>>>> Object prefix: >>>>> benchmark_data_stor1_1791844 >>>>>>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >> lat >>>>>>>>>>>> 0 0 0 0 0 0 - >>>>>>>>>>>> 0 >>>>>>>>>>>> 1 16 125 109 435.873 436 0.022076 >>>>>>>>>>>> 0.0697864 >>>>>>>>>>>> 2 16 139 123 245.948 56 0.246578 >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 3 16 139 123 163.969 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 4 16 139 123 122.978 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 5 16 139 123 98.383 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 6 16 139 123 81.9865 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 7 16 139 123 70.2747 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 8 16 139 123 61.4903 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 9 16 139 123 54.6582 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 10 16 139 123 49.1924 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 11 16 139 123 44.7201 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 12 16 139 123 40.9934 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 13 16 139 123 37.8401 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 14 16 139 123 35.1373 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 15 16 139 123 32.7949 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 16 16 139 123 30.7451 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 17 16 139 123 28.9364 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 18 16 139 123 27.3289 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 19 16 139 123 25.8905 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 2015-09-07 15:54:52.694071min lat: 0.022076 max lat: 0.46117 >>>>>>>>>>>> avg >>>>> lat: >>>>>>> 0.0674407 >>>>>>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >> lat >>>>>>>>>>>> 20 16 139 123 24.596 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 21 16 139 123 23.4247 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 22 16 139 123 22.36 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 23 16 139 123 21.3878 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 24 16 139 123 20.4966 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 25 16 139 123 19.6768 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 26 16 139 123 18.92 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 27 16 139 123 18.2192 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 28 16 139 123 17.5686 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 29 16 139 123 16.9628 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 30 16 139 123 16.3973 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 31 16 139 123 15.8684 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 32 16 139 123 15.3725 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 33 16 139 123 14.9067 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 34 16 139 123 14.4683 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 35 16 139 123 14.0549 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 36 16 139 123 13.6645 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 37 16 139 123 13.2952 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 38 16 139 123 12.9453 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 39 16 139 123 12.6134 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 2015-09-07 15:55:12.697124min lat: 0.022076 max lat: 0.46117 >>>>>>>>>>>> avg >>>>> lat: >>>>>>> 0.0674407 >>>>>>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >> lat >>>>>>>>>>>> 40 16 139 123 12.2981 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> 41 16 139 123 11.9981 0 - >>>>>>>>>>>> 0.0674407 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> cluster 86edf8b8-b353-49f1-ab0a-a4827a9ea5e8 >>>>>>>>>>>> health HEALTH_WARN >>>>>>>>>>>> 1 requests are blocked > 32 sec monmap e3: 3 mons at >>>>>>>>>>>> {stor0111=10.100.1.111:6789/0,stor0113=10.100.1.113:6789/0,st >>>>>>>>>>>> or0 >>>>>>>>>>>> 11 >>>>>>>>>>>> 5=10.100.1.115:6789/0} >>>>>>>>>>>> election epoch 32, quorum 0,1,2 >>>>>>>>>>>> stor0111,stor0113,stor0115 osdmap e19536: 50 osds: 50 up, 50 >>>>>>>>>>>> in pgmap v928610: 2752 pgs, 9 pools, 30476 GB data, 4183 >> kobjects >>>>>>>>>>>> 91513 GB used, 47642 GB / 135 TB avail >>>>>>>>>>>> 2752 active+clean >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Tried using RBD >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> # dd if=/dev/zero of=file1 bs=4K count=10000 oflag=direct >>>>>>>>>>>> 10000+0 records in >>>>>>>>>>>> 10000+0 records out >>>>>>>>>>>> 40960000 bytes (41 MB) copied, 24.5529 s, 1.7 MB/s >>>>>>>>>>>> >>>>>>>>>>>> # dd if=/dev/zero of=file1 bs=1M count=100 oflag=direct >>>>>>>>>>>> 100+0 records in >>>>>>>>>>>> 100+0 records out >>>>>>>>>>>> 104857600 bytes (105 MB) copied, 1.05602 s, 9.3 MB/s >>>>>>>>>>>> >>>>>>>>>>>> # dd if=/dev/zero of=file1 bs=1G count=1 oflag=direct >>>>>>>>>>>> 1+0 records in >>>>>>>>>>>> 1+0 records out >>>>>>>>>>>> 1073741824 bytes (1.1 GB) copied, 293.551 s, 3.7 MB/s ]# >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> ceph-users mailing list >>>>>>>>>>>> >>>>>>>>>>>> [email protected] >>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> ceph-users mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> ceph-users mailing list >>>>>>>>>> [email protected] >>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> [email protected] >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> [email protected] >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> [email protected] >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
