I can attest that the battery in the raid controller is a thing. I'm used to using lsi controllers, but my current position has hp raid controllers and we just tracked down 10 of our nodes that had >100ms await pretty much always were the only 10 nodes in the cluster with failed batteries on the raid controllers.
On Thu, Oct 19, 2017, 8:15 PM Christian Balzer <[email protected]> wrote: > > Hello, > > On Thu, 19 Oct 2017 17:14:17 -0500 Russell Glaue wrote: > > > That is a good idea. > > However, a previous rebalancing processes has brought performance of our > > Guest VMs to a slow drag. > > > > Never mind that I'm not sure that these SSDs are particular well suited > for Ceph, your problem is clearly located on that one node. > > Not that I think it's the case, but make sure your PG distribution is not > skewed with many more PGs per OSD on that node. > > Once you rule that out my first guess is the RAID controller, you're > running the SSDs are single RAID0s I presume? > If so a either configuration difference or a failed BBU on the controller > could result in the writeback cache being disabled, which would explain > things beautifully. > > As for a temporary test/fix (with reduced redundancy of course), set noout > (or mon_osd_down_out_subtree_limit accordingly) and turn the slow host off. > > This should result in much better performance than you have now and of > course be the final confirmation of that host being the culprit. > > Christian > > > > > On Thu, Oct 19, 2017 at 3:55 PM, Jean-Charles Lopez <[email protected]> > > wrote: > > > > > Hi Russell, > > > > > > as you have 4 servers, assuming you are not doing EC pools, just stop > all > > > the OSDs on the second questionable server, mark the OSDs on that > server as > > > out, let the cluster rebalance and when all PGs are active+clean just > > > replay the test. > > > > > > All IOs should then go only to the other 3 servers. > > > > > > JC > > > > > > On Oct 19, 2017, at 13:49, Russell Glaue <[email protected]> wrote: > > > > > > No, I have not ruled out the disk controller and backplane making the > > > disks slower. > > > Is there a way I could test that theory, other than swapping out > hardware? > > > -RG > > > > > > On Thu, Oct 19, 2017 at 3:44 PM, David Turner <[email protected]> > > > wrote: > > > > > >> Have you ruled out the disk controller and backplane in the server > > >> running slower? > > >> > > >> On Thu, Oct 19, 2017 at 4:42 PM Russell Glaue <[email protected]> > wrote: > > >> > > >>> I ran the test on the Ceph pool, and ran atop on all 4 storage > servers, > > >>> as suggested. > > >>> > > >>> Out of the 4 servers: > > >>> 3 of them performed with 17% to 30% disk %busy, and 11% CPU wait. > > >>> Momentarily spiking up to 50% on one server, and 80% on another > > >>> The 2nd newest server was almost averaging 90% disk %busy and 150% > CPU > > >>> wait. And more than momentarily spiking to 101% disk busy and 250% > CPU wait. > > >>> For this 2nd newest server, this was the statistics for about 8 of 9 > > >>> disks, with the 9th disk not far behind the others. > > >>> > > >>> I cannot believe all 9 disks are bad > > >>> They are the same disks as the newest 1st server, > Crucial_CT960M500SSD1, > > >>> and same exact server hardware too. > > >>> They were purchased at the same time in the same purchase order and > > >>> arrived at the same time. > > >>> So I cannot believe I just happened to put 9 bad disks in one server, > > >>> and 9 good ones in the other. > > >>> > > >>> I know I have Ceph configured exactly the same on all servers > > >>> And I am sure I have the hardware settings configured exactly the > same > > >>> on the 1st and 2nd servers. > > >>> So if I were someone else, I would say it maybe is bad hardware on > the > > >>> 2nd server. > > >>> But the 2nd server is running very well without any hint of a > problem. > > >>> > > >>> Any other ideas or suggestions? > > >>> > > >>> -RG > > >>> > > >>> > > >>> On Wed, Oct 18, 2017 at 3:40 PM, Maged Mokhtar <[email protected] > > > > >>> wrote: > > >>> > > >>>> just run the same 32 threaded rados test as you did before and this > > >>>> time run atop while the test is running looking for %busy of > cpu/disks. It > > >>>> should give an idea if there is a bottleneck in them. > > >>>> > > >>>> On 2017-10-18 21:35, Russell Glaue wrote: > > >>>> > > >>>> I cannot run the write test reviewed at the > ceph-how-to-test-if-your-s > > >>>> sd-is-suitable-as-a-journal-device blog. The tests write directly to > > >>>> the raw disk device. > > >>>> Reading an infile (created with urandom) on one SSD, writing the > > >>>> outfile to another osd, yields about 17MB/s. > > >>>> But Isn't this write speed limited by the speed in which in the dd > > >>>> infile can be read? > > >>>> And I assume the best test should be run with no other load. > > >>>> > > >>>> How does one run the rados bench "as stress"? > > >>>> > > >>>> -RG > > >>>> > > >>>> > > >>>> On Wed, Oct 18, 2017 at 1:33 PM, Maged Mokhtar < > [email protected]> > > >>>> wrote: > > >>>> > > >>>>> measuring resource load as outlined earlier will show if the drives > > >>>>> are performing well or not. Also how many osds do you have ? > > >>>>> > > >>>>> On 2017-10-18 19:26, Russell Glaue wrote: > > >>>>> > > >>>>> The SSD drives are Crucial M500 > > >>>>> A Ceph user did some benchmarks and found it had good performance > > >>>>> https://forum.proxmox.com/threads/ceph-bad-performance-in- > > >>>>> qemu-guests.21551/ > > >>>>> > > >>>>> However, a user comment from 3 years ago on the blog post you > linked > > >>>>> to says to avoid the Crucial M500 > > >>>>> > > >>>>> Yet, this performance posting tells that the Crucial M500 is good. > > >>>>> https://inside.servers.com/ssd-performance-2017-c4307a92dea > > >>>>> > > >>>>> On Wed, Oct 18, 2017 at 11:53 AM, Maged Mokhtar < > [email protected]> > > >>>>> wrote: > > >>>>> > > >>>>>> Check out the following link: some SSDs perform bad in Ceph due to > > >>>>>> sync writes to journal > > >>>>>> > > >>>>>> https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-tes > > >>>>>> t-if-your-ssd-is-suitable-as-a-journal-device/ > > >>>>>> > > >>>>>> Anther thing that can help is to re-run the rados 32 threads as > > >>>>>> stress and view resource usage using atop (or collectl/sar) to > check for > > >>>>>> %busy cpu and %busy disks to give you an idea of what is holding > down your > > >>>>>> cluster..for example: if cpu/disk % are all low then check your > > >>>>>> network/switches. If disk %busy is high (90%) for all disks then > your > > >>>>>> disks are the bottleneck: which either means you have SSDs that > are not > > >>>>>> suitable for Ceph or you have too few disks (which i doubt is the > case). If > > >>>>>> only 1 disk %busy is high, there may be something wrong with this > disk > > >>>>>> should be removed. > > >>>>>> > > >>>>>> Maged > > >>>>>> > > >>>>>> On 2017-10-18 18:13, Russell Glaue wrote: > > >>>>>> > > >>>>>> In my previous post, in one of my points I was wondering if the > > >>>>>> request size would increase if I enabled jumbo packets. currently > it is > > >>>>>> disabled. > > >>>>>> > > >>>>>> @jdillama: The qemu settings for both these two guest machines, > with > > >>>>>> RAID/LVM and Ceph/rbd images, are the same. I am not thinking > that changing > > >>>>>> the qemu settings of "min_io_size=<limited to > 16bits>,opt_io_size=<RBD > > >>>>>> image object size>" will directly address the issue. > > >>>>>> > > >>>>>> @mmokhtar: Ok. So you suggest the request size is the result of > the > > >>>>>> problem and not the cause of the problem. meaning I should go > after a > > >>>>>> different issue. > > >>>>>> > > >>>>>> I have been trying to get write speeds up to what people on this > mail > > >>>>>> list are discussing. > > >>>>>> It seems that for our configuration, as it matches others, we > should > > >>>>>> be getting about 70MB/s write speed. > > >>>>>> But we are not getting that. > > >>>>>> Single writes to disk are lucky to get 5MB/s to 6MB/s, but are > > >>>>>> typically 1MB/s to 2MB/s. > > >>>>>> Monitoring the entire Ceph cluster (using > > >>>>>> http://cephdash.crapworks.de/), I have seen very rare momentary > > >>>>>> spikes up to 30MB/s. > > >>>>>> > > >>>>>> My storage network is connected via a 10Gb switch > > >>>>>> I have 4 storage servers with a LSI Logic MegaRAID SAS 2208 > controller > > >>>>>> Each storage server has 9 1TB SSD drives, each drive as 1 osd (no > > >>>>>> RAID) > > >>>>>> Each drive is one LVM group, with two volumes - one volume for the > > >>>>>> osd, one volume for the journal > > >>>>>> Each osd is formatted with xfs > > >>>>>> The crush map is simple: default->rack->[host[1..4]->osd] with an > > >>>>>> evenly distributed weight > > >>>>>> The redundancy is triple replication > > >>>>>> > > >>>>>> While I have read comments that having the osd and journal on the > > >>>>>> same disk decreases write speed, I have also read that once past > 8 OSDs per > > >>>>>> node this is the recommended configuration, however this is also > the reason > > >>>>>> why SSD drives are used exclusively for OSDs in the storage nodes. > > >>>>>> None-the-less, I was still expecting write speeds to be above > 30MB/s, > > >>>>>> not below 6MB/s. > > >>>>>> Even at 12x slower than the RAID, using my previously posted > iostat > > >>>>>> data set, I should be seeing write speeds that average 10MB/s, > not 2MB/s. > > >>>>>> > > >>>>>> In regards to the rados benchmark tests you asked me to run, here > is > > >>>>>> the output: > > >>>>>> > > >>>>>> [centos7]# rados bench -p scbench -b 4096 30 write -t 1 > > >>>>>> Maintaining 1 concurrent writes of 4096 bytes to objects of size > 4096 > > >>>>>> for up to 30 seconds or 0 objects > > >>>>>> Object prefix: benchmark_data_hamms.sys.cu.cait.org_85049 > > >>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat(s) > > >>>>>> avg lat(s) > > >>>>>> 0 0 0 0 0 0 - > > >>>>>> 0 > > >>>>>> 1 1 201 200 0.78356 0.78125 0.00522307 > > >>>>>> 0.00496574 > > >>>>>> 2 1 469 468 0.915303 1.04688 0.00437497 > > >>>>>> 0.00426141 > > >>>>>> 3 1 741 740 0.964371 1.0625 0.00512853 > > >>>>>> 0.0040434 > > >>>>>> 4 1 888 887 0.866739 0.574219 0.00307699 > > >>>>>> 0.00450177 > > >>>>>> 5 1 1147 1146 0.895725 1.01172 0.00376454 > > >>>>>> 0.0043559 > > >>>>>> 6 1 1325 1324 0.862293 0.695312 0.00459443 > > >>>>>> 0.004525 > > >>>>>> 7 1 1494 1493 0.83339 0.660156 0.00461002 > > >>>>>> 0.00458452 > > >>>>>> 8 1 1736 1735 0.847369 0.945312 0.00253971 > > >>>>>> 0.00460458 > > >>>>>> 9 1 1998 1997 0.866922 1.02344 0.00236573 > > >>>>>> 0.00450172 > > >>>>>> 10 1 2260 2259 0.882563 1.02344 0.00262179 > > >>>>>> 0.00442152 > > >>>>>> 11 1 2526 2525 0.896775 1.03906 0.00336914 > > >>>>>> 0.00435092 > > >>>>>> 12 1 2760 2759 0.898203 0.914062 0.00351827 > > >>>>>> 0.00434491 > > >>>>>> 13 1 3016 3015 0.906025 1 0.00335703 > > >>>>>> 0.00430691 > > >>>>>> 14 1 3257 3256 0.908545 0.941406 0.00332344 > > >>>>>> 0.00429495 > > >>>>>> 15 1 3490 3489 0.908644 0.910156 0.00318815 > > >>>>>> 0.00426387 > > >>>>>> 16 1 3728 3727 0.909952 0.929688 0.0032881 > > >>>>>> 0.00428895 > > >>>>>> 17 1 3986 3985 0.915703 1.00781 0.00274809 > > >>>>>> 0.0042614 > > >>>>>> 18 1 4250 4249 0.922116 1.03125 0.00287411 > > >>>>>> 0.00423214 > > >>>>>> 19 1 4505 4504 0.926003 0.996094 0.00375435 > > >>>>>> 0.00421442 > > >>>>>> 2017-10-18 10:56:31.267173 min lat: 0.00181259 max lat: 0.270553 > avg > > >>>>>> lat: 0.00420118 > > >>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat(s) > > >>>>>> avg lat(s) > > >>>>>> 20 1 4757 4756 0.928915 0.984375 0.00463972 > > >>>>>> 0.00420118 > > >>>>>> 21 1 5009 5008 0.93155 0.984375 0.00360065 > > >>>>>> 0.00418937 > > >>>>>> 22 1 5235 5234 0.929329 0.882812 0.00626214 > > >>>>>> 0.004199 > > >>>>>> 23 1 5500 5499 0.933925 1.03516 0.00466584 > > >>>>>> 0.00417836 > > >>>>>> 24 1 5708 5707 0.928861 0.8125 0.00285727 > > >>>>>> 0.00420146 > > >>>>>> 25 0 5964 5964 0.931858 1.00391 0.00417383 > > >>>>>> 0.0041881 > > >>>>>> 26 1 6216 6215 0.933722 0.980469 0.0041009 > > >>>>>> 0.00417915 > > >>>>>> 27 1 6481 6480 0.937474 1.03516 0.00307484 > > >>>>>> 0.00416118 > > >>>>>> 28 1 6745 6744 0.940819 1.03125 0.00266329 > > >>>>>> 0.00414777 > > >>>>>> 29 1 7003 7002 0.943124 1.00781 0.00305905 > > >>>>>> 0.00413758 > > >>>>>> 30 1 7271 7270 0.946578 1.04688 0.00391017 > > >>>>>> 0.00412238 > > >>>>>> Total time run: 30.006060 > > >>>>>> Total writes made: 7272 > > >>>>>> Write size: 4096 > > >>>>>> Object size: 4096 > > >>>>>> Bandwidth (MB/sec): 0.946684 > > >>>>>> Stddev Bandwidth: 0.123762 > > >>>>>> Max bandwidth (MB/sec): 1.0625 > > >>>>>> Min bandwidth (MB/sec): 0.574219 > > >>>>>> Average IOPS: 242 > > >>>>>> Stddev IOPS: 31 > > >>>>>> Max IOPS: 272 > > >>>>>> Min IOPS: 147 > > >>>>>> Average Latency(s): 0.00412247 > > >>>>>> Stddev Latency(s): 0.00648437 > > >>>>>> Max latency(s): 0.270553 > > >>>>>> Min latency(s): 0.00175318 > > >>>>>> Cleaning up (deleting benchmark objects) > > >>>>>> Clean up completed and total clean up time :29.069423 > > >>>>>> > > >>>>>> [centos7]# rados bench -p scbench -b 4096 30 write -t 32 > > >>>>>> Maintaining 32 concurrent writes of 4096 bytes to objects of size > > >>>>>> 4096 for up to 30 seconds or 0 objects > > >>>>>> Object prefix: benchmark_data_hamms.sys.cu.cait.org_86076 > > >>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat(s) > > >>>>>> avg lat(s) > > >>>>>> 0 0 0 0 0 0 - > > >>>>>> 0 > > >>>>>> 1 32 3013 2981 11.6438 11.6445 0.00247906 > > >>>>>> 0.00572026 > > >>>>>> 2 32 5349 5317 10.3834 9.125 0.00246662 > > >>>>>> 0.00932016 > > >>>>>> 3 32 5707 5675 7.3883 1.39844 0.00389774 > > >>>>>> 0.0156726 > > >>>>>> 4 32 5895 5863 5.72481 0.734375 1.13137 > > >>>>>> 0.0167946 > > >>>>>> 5 32 6869 6837 5.34068 3.80469 0.0027652 > > >>>>>> 0.0226577 > > >>>>>> 6 32 8901 8869 5.77306 7.9375 0.0053211 > > >>>>>> 0.0216259 > > >>>>>> 7 32 10800 10768 6.00785 7.41797 0.00358187 > > >>>>>> 0.0207418 > > >>>>>> 8 32 11825 11793 5.75728 4.00391 0.00217575 > > >>>>>> 0.0215494 > > >>>>>> 9 32 12941 12909 5.6019 4.35938 0.00278512 > > >>>>>> 0.0220567 > > >>>>>> 10 32 13317 13285 5.18849 1.46875 0.0034973 > > >>>>>> 0.0240665 > > >>>>>> 11 32 16189 16157 5.73653 11.2188 0.00255841 > > >>>>>> 0.0212708 > > >>>>>> 12 32 16749 16717 5.44077 2.1875 0.00330334 > > >>>>>> 0.0215915 > > >>>>>> 13 32 16756 16724 5.02436 0.0273438 0.00338994 > > >>>>>> 0.021849 > > >>>>>> 14 32 17908 17876 4.98686 4.5 0.00402598 > > >>>>>> 0.0244568 > > >>>>>> 15 32 17936 17904 4.66171 0.109375 0.00375799 > > >>>>>> 0.0245545 > > >>>>>> 16 32 18279 18247 4.45409 1.33984 0.00483873 > > >>>>>> 0.0267929 > > >>>>>> 17 32 18372 18340 4.21346 0.363281 0.00505187 > > >>>>>> 0.0275887 > > >>>>>> 18 32 19403 19371 4.20309 4.02734 0.00545154 > > >>>>>> 0.029348 > > >>>>>> 19 31 19845 19814 4.07295 1.73047 0.00254726 > > >>>>>> 0.0306775 > > >>>>>> 2017-10-18 10:57:58.160536 min lat: 0.0015005 max lat: 2.27707 avg > > >>>>>> lat: 0.0307559 > > >>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat(s) > > >>>>>> avg lat(s) > > >>>>>> 20 31 20401 20370 3.97788 2.17188 0.00307238 > > >>>>>> 0.0307559 > > >>>>>> 21 32 21338 21306 3.96254 3.65625 0.00464563 > > >>>>>> 0.0312288 > > >>>>>> 22 32 23057 23025 4.0876 6.71484 0.00296295 > > >>>>>> 0.0299267 > > >>>>>> 23 32 23057 23025 3.90988 0 - > > >>>>>> 0.0299267 > > >>>>>> 24 32 23803 23771 3.86837 1.45703 0.00301471 > > >>>>>> 0.0312804 > > >>>>>> 25 32 24112 24080 3.76191 1.20703 0.00191063 > > >>>>>> 0.0331462 > > >>>>>> 26 31 25303 25272 3.79629 4.65625 0.00794399 > > >>>>>> 0.0329129 > > >>>>>> 27 32 28803 28771 4.16183 13.668 0.0109817 > > >>>>>> 0.0297469 > > >>>>>> 28 32 29592 29560 4.12325 3.08203 0.00188185 > > >>>>>> 0.0301911 > > >>>>>> 29 32 30595 30563 4.11616 3.91797 0.00379099 > > >>>>>> 0.0296794 > > >>>>>> 30 32 31031 30999 4.03572 1.70312 0.00283347 > > >>>>>> 0.0302411 > > >>>>>> Total time run: 30.822350 > > >>>>>> Total writes made: 31032 > > >>>>>> Write size: 4096 > > >>>>>> Object size: 4096 > > >>>>>> Bandwidth (MB/sec): 3.93282 > > >>>>>> Stddev Bandwidth: 3.66265 > > >>>>>> Max bandwidth (MB/sec): 13.668 > > >>>>>> Min bandwidth (MB/sec): 0 > > >>>>>> Average IOPS: 1006 > > >>>>>> Stddev IOPS: 937 > > >>>>>> Max IOPS: 3499 > > >>>>>> Min IOPS: 0 > > >>>>>> Average Latency(s): 0.0317779 > > >>>>>> Stddev Latency(s): 0.164076 > > >>>>>> Max latency(s): 2.27707 > > >>>>>> Min latency(s): 0.0013848 > > >>>>>> Cleaning up (deleting benchmark objects) > > >>>>>> Clean up completed and total clean up time :20.166559 > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> On Wed, Oct 18, 2017 at 8:51 AM, Maged Mokhtar < > [email protected]> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> First a general comment: local RAID will be faster than Ceph for > a > > >>>>>>> single threaded (queue depth=1) io operation test. A single > thread Ceph > > >>>>>>> client will see at best same disk speed for reads and for writes > 4-6 times > > >>>>>>> slower than single disk. Not to mention the latency of local > disks will > > >>>>>>> much better. Where Ceph shines is when you have many concurrent > ios, it > > >>>>>>> scales whereas RAID will decrease speed per client as you add > more. > > >>>>>>> > > >>>>>>> Having said that, i would recommend running rados/rbd bench-write > > >>>>>>> and measure 4k iops at 1 and 32 threads to get a better idea of > how your > > >>>>>>> cluster performs: > > >>>>>>> > > >>>>>>> ceph osd pool create testpool 256 256 > > >>>>>>> rados bench -p testpool -b 4096 30 write -t 1 > > >>>>>>> rados bench -p testpool -b 4096 30 write -t 32 > > >>>>>>> ceph osd pool delete testpool testpool > --yes-i-really-really-mean-it > > >>>>>>> > > >>>>>>> rbd bench-write test-image --io-threads=1 --io-size 4096 > > >>>>>>> --io-pattern rand --rbd_cache=false > > >>>>>>> rbd bench-write test-image --io-threads=32 --io-size 4096 > > >>>>>>> --io-pattern rand --rbd_cache=false > > >>>>>>> > > >>>>>>> I think the request size difference you see is due to the io > > >>>>>>> scheduler in the case of local disks having more ios to re-group > so has a > > >>>>>>> better chance in generating larger requests. Depending on your > kernel, the > > >>>>>>> io scheduler may be different for rbd (blq-mq) vs sdx (cfq) but > again i > > >>>>>>> would think the request size is a result not a cause. > > >>>>>>> > > >>>>>>> Maged > > >>>>>>> > > >>>>>>> On 2017-10-17 23:12, Russell Glaue wrote: > > >>>>>>> > > >>>>>>> I am running ceph jewel on 5 nodes with SSD OSDs. > > >>>>>>> I have an LVM image on a local RAID of spinning disks. > > >>>>>>> I have an RBD image on in a pool of SSD disks. > > >>>>>>> Both disks are used to run an almost identical CentOS 7 system. > > >>>>>>> Both systems were installed with the same kickstart, though the > disk > > >>>>>>> partitioning is different. > > >>>>>>> > > >>>>>>> I want to make writes on the the ceph image faster. For example, > > >>>>>>> lots of writes to MySQL (via MySQL replication) on a ceph SSD > image are > > >>>>>>> about 10x slower than on a spindle RAID disk image. The MySQL > server on > > >>>>>>> ceph rbd image has a hard time keeping up in replication. > > >>>>>>> > > >>>>>>> So I wanted to test writes on these two systems > > >>>>>>> I have a 10GB compressed (gzip) file on both servers. > > >>>>>>> I simply gunzip the file on both systems, while running iostat. > > >>>>>>> > > >>>>>>> The primary difference I see in the results is the average size > of > > >>>>>>> the request to the disk. > > >>>>>>> CentOS7-lvm-raid-sata writes a lot faster to disk, and the size > of > > >>>>>>> the request is about 40x, but the number of writes per second is > about the > > >>>>>>> same > > >>>>>>> This makes me want to conclude that the smaller size of the > request > > >>>>>>> for CentOS7-ceph-rbd-ssd system is the cause of it being slow. > > >>>>>>> > > >>>>>>> > > >>>>>>> How can I make the size of the request larger for ceph rbd > images, > > >>>>>>> so I can increase the write throughput? > > >>>>>>> Would this be related to having jumbo packets enabled in my ceph > > >>>>>>> storage network? > > >>>>>>> > > >>>>>>> > > >>>>>>> Here is a sample of the results: > > >>>>>>> > > >>>>>>> [CentOS7-lvm-raid-sata] > > >>>>>>> $ gunzip large10gFile.gz & > > >>>>>>> $ iostat -x vg_root-lv_var -d 5 -m -N > > >>>>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > > >>>>>>> avgrq-sz avgqu-sz await r_await w_await svctm %util > > >>>>>>> ... > > >>>>>>> vg_root-lv_var 0.00 0.00 30.60 452.20 13.60 > 222.15 > > >>>>>>> 1000.04 8.69 14.05 0.99 14.93 2.07 100.04 > > >>>>>>> vg_root-lv_var 0.00 0.00 88.20 182.00 39.20 > 89.43 > > >>>>>>> 974.95 4.65 9.82 0.99 14.10 3.70 100.00 > > >>>>>>> vg_root-lv_var 0.00 0.00 75.45 278.24 33.53 > 136.70 > > >>>>>>> 985.73 4.36 33.26 1.34 41.91 0.59 20.84 > > >>>>>>> vg_root-lv_var 0.00 0.00 111.60 181.80 49.60 > 89.34 > > >>>>>>> 969.84 2.60 8.87 0.81 13.81 0.13 3.90 > > >>>>>>> vg_root-lv_var 0.00 0.00 68.40 109.60 30.40 > 53.63 > > >>>>>>> 966.87 1.51 8.46 0.84 13.22 0.80 14.16 > > >>>>>>> ... > > >>>>>>> > > >>>>>>> [CentOS7-ceph-rbd-ssd] > > >>>>>>> $ gunzip large10gFile.gz & > > >>>>>>> $ iostat -x vg_root-lv_data -d 5 -m -N > > >>>>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > > >>>>>>> avgrq-sz avgqu-sz await r_await w_await svctm %util > > >>>>>>> ... > > >>>>>>> vg_root-lv_data 0.00 0.00 46.40 167.80 0.88 > 1.46 > > >>>>>>> 22.36 1.23 5.66 2.47 6.54 4.52 96.82 > > >>>>>>> vg_root-lv_data 0.00 0.00 16.60 55.20 0.36 > 0.14 > > >>>>>>> 14.44 0.99 13.91 9.12 15.36 13.71 98.46 > > >>>>>>> vg_root-lv_data 0.00 0.00 69.00 173.80 1.34 > 1.32 > > >>>>>>> 22.48 1.25 5.19 3.77 5.75 3.94 95.68 > > >>>>>>> vg_root-lv_data 0.00 0.00 74.40 293.40 1.37 > 1.47 > > >>>>>>> 15.83 1.22 3.31 2.06 3.63 2.54 93.26 > > >>>>>>> vg_root-lv_data 0.00 0.00 90.80 359.00 1.96 > 3.41 > > >>>>>>> 24.45 1.63 3.63 1.94 4.05 2.10 94.38 > > >>>>>>> ... > > >>>>>>> > > >>>>>>> [iostat key] > > >>>>>>> w/s == The number (after merges) of write requests completed per > > >>>>>>> second for the device. > > >>>>>>> wMB/s == The number of sectors (kilobytes, megabytes) written to > the > > >>>>>>> device per second. > > >>>>>>> avgrq-sz == The average size (in kilobytes) of the requests that > > >>>>>>> were issued to the device. > > >>>>>>> avgqu-sz == The average queue length of the requests that were > > >>>>>>> issued to the device. > > >>>>>>> > > >>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> ceph-users mailing list > > >>>>>>> [email protected] > > >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>> > > >>>> > > >>>> > > >>> > > >>> _______________________________________________ > > >>> ceph-users mailing list > > >>> [email protected] > > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >>> > > >> > > > _______________________________________________ > > > ceph-users mailing list > > > [email protected] > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > -- > Christian Balzer Network/Systems Engineer > [email protected] Rakuten Communications > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
