I wanted to report an update. We added more ceph storage nodes, so we can take the problem OSDs out. speeds are faster.
I found a way to monitor OSD latency in ceph, using "ceph pg dump osds" The commit latency is always "0" for us. fs_perf_stat/commit_latency_ms But the apply latency shows us the slow OSDs. fs_perf_stat/apply_latency_ms The latest ceph has a prometheus plugin ( http://docs.ceph.com/docs/master/mgr/prometheus/), so this information can be stored and monitored (e.g. with Grafana). Then, over time, we can see which OSDs are the problem. (so I don't have to deal with atop, nor run lots of benchmark tests) (Use this for older ceph versions: https://github.com/digitalocean/ceph_exporter) It turns out we had about 5 problem SSD drives in the slowest ceph node, and about 2 in the second slowest. All the other OSDs in those two machines (the crucial drives I reported earlier) are running below a max 0.02 milliseconds - so I just had a few bad drives. The newest ceph nodes we added, we purchased the kingston drives, and their latency is below a max 0.001 millisecond latency - none are bad drives. I now see up to 28MBps write speeds, and 260MBps read speeds. - # ceph pg dump osds -f json-pretty dumped osds in format json-pretty [ { "osd": 8, "kb": 1952015104, "kb_used": 1331273140, "kb_avail": 620741964, "hb_in": [ 0, 1, 2, 3, 5, 6, 11, 12, 13, 16, 17, 18, 19, 20, 21 ], "hb_out": [], "snap_trim_queue_len": 0, "num_snap_trimming": 0, "op_queue_age_hist": { "histogram": [], "upper_bound": 1 }, "fs_perf_stat": { "commit_latency_ms": 0, "apply_latency_ms": 49 } }, ... - On Fri, Dec 8, 2017 at 9:20 AM, Russell Glaue <[email protected]> wrote: > Here are some random samples I recorded in the past 30 minutes. > > 11 K blocks 10542 kB/s 909 op/s > 12 K blocks 15397 kB/s 1247 op/s > 26 K blocks 34306 kB/s 1307 op/s > 33 K blocks 48509 kB/s 1465 op/s > 59 K blocks 59333 kB/s 999 op/s > 172 K blocks 101939 kB/s 590 op/s > 104 K blocks 82605 kB/s 788 op/s > 128 K blocks 77454 kB/s 601 op/s > 136 K blocks 47526 kB/s 348 op/s > > > > On Fri, Dec 8, 2017 at 2:04 AM, Maged Mokhtar <[email protected]> > wrote: > >> 4M block sizes you will only need 22.5 iops >> >> On 2017-12-08 09:59, Maged Mokhtar wrote: >> >> Hi Russell, >> >> It is probably due to the difference in block sizes used in the test vs >> your cluster load. You have a latency problem which is limiting your max >> write iops to around 2.5K. For large block sizes you do not need that many >> iops, for example if you write in 4M block sizes you will only need 12.5 >> iops to reach your bandwidth of 90 MB/s, in such case you latency problem >> will not affect your bandwidth. The reason i had suggested you run the >> original test in 4k size was because this was the original problem subject >> of this thread, the gunzip test and the small block sizes you were getting >> with iostat. >> >> If you want to know a "rough" ballpark on what block sizes you currently >> see on your cluster, get the total bandwidth and iops as reported by ceph ( >> ceph status should give you this ) and divide the first by the second. >> >> I still think you have a significant latency/iops issue: a 36 all SSDs >> cluster should give much higher that 2.5K iops >> >> Maged >> >> >> On 2017-12-07 23:57, Russell Glaue wrote: >> >> I want to provide an update to my interesting situation. >> (New storage nodes were purchased and are going into the cluster soon) >> >> I have been monitoring the ceph storage nodes with atop and read/write >> through put with ceph-dash for the last month. >> I am regularly seeing 80-90MB/s of write throughput (140MB/s read) on the >> ceph cluster. At these moments, the problem ceph node I have been speaking >> of shows 101% disk busy on the same 3 to 4 (of the 9) OSDs. So I am getting >> the throughput that I want with on the cluster, despite the OSDs in >> question. >> >> However, when I run the bench tests described in this thread, I do not >> see the write throughput go above 5MB/s. >> When I take the problem node out, and run the bench tests, I see the >> throughput double, but not over 10MB/s. >> >> Why is the ceph cluster getting up to 90MB/s write in the wild, but not >> when running the bench tests ? >> >> -RG >> >> >> >> >> On Fri, Oct 27, 2017 at 4:21 PM, Russell Glaue <[email protected]> wrote: >> >>> Yes, several have recommended the fio test now. >>> I cannot perform a fio test at this time. Because the post referred to >>> directs us to write the fio test data directly to the disk device, e.g. >>> /dev/sdj. I'd have to take an OSD completely out in order to perform the >>> test. And I am not ready to do that at this time. Perhaps after I attempt >>> the hardware firmware updates, and still do not have an answer, I would >>> then take an OSD out of the cluster to run the fio test. >>> Also, our M500 disks on the two newest machines are all running version >>> MU05, the latest firmware. The on the older two, they are behind a RAID0, >>> but I suspect they might be MU03 firmware. >>> -RG >>> >>> >>> On Fri, Oct 27, 2017 at 4:12 PM, Brian Andrus < >>> [email protected]> wrote: >>> >>>> I would be interested in seeing the results from the post mentioned by >>>> an earlier contributor: >>>> >>>> https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-tes >>>> t-if-your-ssd-is-suitable-as-a-journal-device/ >>>> >>>> Test an "old" M500 and a "new" M500 and see if the performance is A) >>>> acceptable and B) comparable. Find hardware revision or firmware revision >>>> in case of A=Good and B=different. >>>> >>>> If the "old" device doesn't test well in fio/dd testing, then the >>>> drives are (as expected) not a great choice for journals and you might want >>>> to look at hardware/backplane/RAID configuration differences that are >>>> somehow allowing them to perform adequately. >>>> >>>> On Fri, Oct 27, 2017 at 12:36 PM, Russell Glaue <[email protected]> >>>> wrote: >>>> >>>>> Yes, all the MD500s we use are both journal and OSD, even the older >>>>> ones. We have a 3 year lifecycle and move older nodes from one ceph >>>>> cluster >>>>> to another. >>>>> On old systems with 3 year old MD500s, they run as RAID0, and run >>>>> faster than our current problem system with 1 year old MD500s, ran as >>>>> nonraid pass-through on the controller. >>>>> >>>>> All disks are SATA and are connected to a SAS controller. We were >>>>> wondering if the SAS/SATA conversion is an issue. Yet, the older systems >>>>> don't exhibit a problem. >>>>> >>>>> I found what I wanted to know from a colleague, that when the current >>>>> ceph cluster was put together, the SSDs tested at 300+MB/s, and ceph >>>>> cluster writes at 30MB/s. >>>>> >>>>> Using SMART tools, the reserved cells in all drives is nearly 100%. >>>>> >>>>> Restarting the OSDs minorly improved performance. Still betting on >>>>> hardware issues that a firmware upgrade may resolve. >>>>> >>>>> -RG >>>>> >>>>> >>>>> On Oct 27, 2017 1:14 PM, "Brian Andrus" <[email protected]> >>>>> wrote: >>>>> >>>>> @Russel, are your "older Crucial M500"s being used as journals? >>>>> >>>>> Crucial M500s are not to be used as a Ceph journal in my last >>>>> experience with them. They make good OSDs with an NVMe in front of them >>>>> perhaps, but not much else. >>>>> >>>>> Ceph uses O_DSYNC for journal writes and these drives do not handle >>>>> them as expected. It's been many years since I've dealt with the M500s >>>>> specifically, but it has to do with the capacitor/power save feature and >>>>> how it handles those types of writes. I'm sorry I don't have the emails >>>>> with specifics around anymore, but last I remember, this was a hardware >>>>> issue and could not be resolved with firmware. >>>>> >>>>> Paging Kyle Bader... >>>>> >>>>> On Fri, Oct 27, 2017 at 9:24 AM, Russell Glaue <[email protected]> >>>>> wrote: >>>>> >>>>>> We have older crucial M500 disks operating without such problems. So, >>>>>> I have to believe it is a hardware firmware issue. >>>>>> And its peculiar seeing performance boost slightly, even 24 hours >>>>>> later, when I stop then start the OSDs. >>>>>> >>>>>> Our actual writes are low, as most of our Ceph Cluster based images >>>>>> are low-write, high-memory. So a 20GB/day life/write capacity is a >>>>>> non-issue for us. Only write speed is the concern. Our write-intensive >>>>>> images are locked on non-ceph disks. >>>>>> >>>>>> What are others using for SSD drives in their Ceph cluster? >>>>>> With 0.50+ DWPD (Drive Writes Per Day), the Kingston SEDC400S37 >>>>>> models seems to be the best for the price today. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Oct 27, 2017 at 6:34 AM, Maged Mokhtar <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> It is quiet likely related, things are pointing to bad disks. >>>>>>> Probably the best thing is to plan for disk replacement, the sooner the >>>>>>> better as it could get worse. >>>>>>> >>>>>>> >>>>>>> On 2017-10-27 02:22, Christian Wuerdig wrote: >>>>>>> >>>>>>> Hm, no necessarily directly related to your performance problem, >>>>>>> however: These SSDs have a listed endurance of 72TB total data >>>>>>> written >>>>>>> - over a 5 year period that's 40GB a day or approx 0.04 DWPD. Given >>>>>>> that you run the journal for each OSD on the same disk, that's >>>>>>> effectively at most 0.02 DWPD (about 20GB per day per disk). I don't >>>>>>> know many who'd run a cluster on disks like those. Also it means >>>>>>> these >>>>>>> are pure consumer drives which have a habit of exhibiting random >>>>>>> performance at times (based on unquantified anecdotal personal >>>>>>> experience with other consumer model SSDs). I wouldn't touch these >>>>>>> with a long stick for anything but small toy-test clusters. >>>>>>> >>>>>>> On Fri, Oct 27, 2017 at 3:44 AM, Russell Glaue <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 25, 2017 at 7:09 PM, Maged Mokhtar <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> It depends on what stage you are in: >>>>>>> in production, probably the best thing is to setup a monitoring tool >>>>>>> (collectd/grahite/prometheus/grafana) to monitor both ceph stats as >>>>>>> well as >>>>>>> resource load. This will, among other things, show you if you have >>>>>>> slowing >>>>>>> disks. >>>>>>> >>>>>>> >>>>>>> I am monitoring Ceph performance with ceph-dash >>>>>>> (http://cephdash.crapworks.de/), that is why I knew to look into >>>>>>> the slow >>>>>>> writes issue. And I am using Monitorix (http://www.monitorix.org/) >>>>>>> to >>>>>>> monitor system resources, including Disk I/O. >>>>>>> >>>>>>> However, though I can monitor individual disk performance at the >>>>>>> system >>>>>>> level, it seems Ceph does not tax any disk more than the worst disk. >>>>>>> So in >>>>>>> my monitoring charts, all disks have the same performance. >>>>>>> All four nodes are base-lining at 50 writes/sec during the cluster's >>>>>>> normal >>>>>>> load, with the non-problem hosts spiking up to 150, and the problem >>>>>>> host >>>>>>> only spikes up to 100. >>>>>>> But during the window of time I took the problem host OSDs down to >>>>>>> run the >>>>>>> bench tests, the OSDs on the other nodes increased to 300-500 >>>>>>> writes/sec. >>>>>>> Otherwise, the chart looks the same for all disks on all ceph >>>>>>> nodes/hosts. >>>>>>> >>>>>>> Before production you should first make sure your SSDs are suitable >>>>>>> for >>>>>>> Ceph, either by being recommend by other Ceph users or you test them >>>>>>> yourself for sync writes performance using fio tool as outlined >>>>>>> earlier. >>>>>>> Then after you build your cluster you can use rados and/or rbd >>>>>>> bencmark >>>>>>> tests to benchmark your cluster and find bottlenecks using >>>>>>> atop/sar/collectl >>>>>>> which will help you tune your cluster. >>>>>>> >>>>>>> >>>>>>> All 36 OSDs are: Crucial_CT960M500SSD1 >>>>>>> >>>>>>> Rados bench tests were done at the beginning. The speed was much >>>>>>> faster than >>>>>>> it is now. I cannot recall the test results, someone else on my team >>>>>>> ran >>>>>>> them. Recently, I had thought the slow disk problem was a >>>>>>> configuration >>>>>>> issue with Ceph - before I posted here. Now we are hoping it may be >>>>>>> resolved >>>>>>> with a firmware update. (If it is firmware related, rebooting the >>>>>>> problem >>>>>>> node may temporarily resolve this) >>>>>>> >>>>>>> >>>>>>> Though you did see better improvements, your cluster with 27 SSDs >>>>>>> should >>>>>>> give much higher numbers than 3k iops. If you are running rados >>>>>>> bench while >>>>>>> you have other client ios, then obviously the reported number by the >>>>>>> tool >>>>>>> will be less than what the cluster is actually giving...which you >>>>>>> can find >>>>>>> out via ceph status command, it will print the total cluster >>>>>>> throughput and >>>>>>> iops. If the total is still low i would recommend running the fio >>>>>>> raw disk >>>>>>> test, maybe the disks are not suitable. When you removed your 9 bad >>>>>>> disk >>>>>>> from 36 and your performance doubled, you still had 2 other disk >>>>>>> slowing >>>>>>> you..meaning near 100% busy ? It makes me feel the disk type used is >>>>>>> not >>>>>>> good. For these near 100% busy disks can you also measure their raw >>>>>>> disk >>>>>>> iops at that load (i am not sure atop shows this, if not use >>>>>>> sat/syssyat/iostat/collecl). >>>>>>> >>>>>>> >>>>>>> I ran another bench test today with all 36 OSDs up. The overall >>>>>>> performance >>>>>>> was improved slightly compared to the original tests. Only 3 OSDs on >>>>>>> the >>>>>>> problem host were increasing to 101% disk busy. >>>>>>> The iops reported from ceph status during this bench test ranged >>>>>>> from 1.6k >>>>>>> to 3.3k, the test yielding 4k iops. >>>>>>> >>>>>>> Yes, the two other OSDs/disks that were the bottleneck were at 101% >>>>>>> disk >>>>>>> busy. The other OSD disks on the same host were sailing along at >>>>>>> like 50-60% >>>>>>> busy. >>>>>>> >>>>>>> All 36 OSD disks are exactly the same disk. They were all purchased >>>>>>> at the >>>>>>> same time. All were installed at the same time. >>>>>>> I cannot believe it is a problem with the disk model. A failed/bad >>>>>>> disk, >>>>>>> perhaps is possible. But the disk model itself cannot be the problem >>>>>>> based >>>>>>> on what I am seeing. If I am seeing bad performance on all disks on >>>>>>> one ceph >>>>>>> node/host, but not on another ceph node with these same disks, it >>>>>>> has to be >>>>>>> some other factor. This is why I am now guessing a firmware upgrade >>>>>>> is >>>>>>> needed. >>>>>>> >>>>>>> Also, as I eluded to here earlier. I took down all 9 OSDs in the >>>>>>> problem >>>>>>> host yesterday to run the bench test. >>>>>>> Today, with those 9 OSDs back online, I rerun the bench test, I am >>>>>>> see 2-3 >>>>>>> OSD disks with 101% busy on the problem host, and the other disks >>>>>>> are lower >>>>>>> than 80%. So, for whatever reason, shutting down the OSDs and >>>>>>> starting them >>>>>>> back up, allowed many (not all) of the OSDs performance to improve >>>>>>> on the >>>>>>> problem host. >>>>>>> >>>>>>> >>>>>>> Maged >>>>>>> >>>>>>> On 2017-10-25 23:44, Russell Glaue wrote: >>>>>>> >>>>>>> Thanks to all. >>>>>>> I took the OSDs down in the problem host, without shutting down the >>>>>>> machine. >>>>>>> As predicted, our MB/s about doubled. >>>>>>> Using this bench/atop procedure, I found two other OSDs on another >>>>>>> host >>>>>>> that are the next bottlenecks. >>>>>>> >>>>>>> Is this the only good way to really test the performance of the >>>>>>> drives as >>>>>>> OSDs? Is there any other way? >>>>>>> >>>>>>> While running the bench on all 36 OSDs, the 9 problem OSDs stuck >>>>>>> out. But >>>>>>> two new problem OSDs I just discovered in this recent test of 27 >>>>>>> OSDs did >>>>>>> not stick out at all. Because ceph bench distributes the load making >>>>>>> only >>>>>>> the very worst denominators show up in atop. So ceph is a slow as >>>>>>> your >>>>>>> slowest drive. >>>>>>> >>>>>>> It would be really great if I could run the bench test, and some how >>>>>>> get >>>>>>> the bench to use only certain OSDs during the test. Then I could run >>>>>>> the >>>>>>> test, avoiding the OSDs that I already know is a problem, so I can >>>>>>> find the >>>>>>> next worst OSD. >>>>>>> >>>>>>> >>>>>>> [ the bench test ] >>>>>>> rados bench -p scbench -b 4096 30 write -t 32 >>>>>>> >>>>>>> [ original results with all 36 OSDs ] >>>>>>> Total time run: 30.822350 >>>>>>> Total writes made: 31032 >>>>>>> Write size: 4096 >>>>>>> Object size: 4096 >>>>>>> Bandwidth (MB/sec): 3.93282 >>>>>>> Stddev Bandwidth: 3.66265 >>>>>>> Max bandwidth (MB/sec): 13.668 >>>>>>> Min bandwidth (MB/sec): 0 >>>>>>> Average IOPS: 1006 >>>>>>> Stddev IOPS: 937 >>>>>>> Max IOPS: 3499 >>>>>>> Min IOPS: 0 >>>>>>> Average Latency(s): 0.0317779 >>>>>>> Stddev Latency(s): 0.164076 >>>>>>> Max latency(s): 2.27707 >>>>>>> Min latency(s): 0.0013848 >>>>>>> Cleaning up (deleting benchmark objects) >>>>>>> Clean up completed and total clean up time :20.166559 >>>>>>> >>>>>>> [ after stopping all of the OSDs (9) on the problem host ] >>>>>>> Total time run: 32.586830 >>>>>>> Total writes made: 59491 >>>>>>> Write size: 4096 >>>>>>> Object size: 4096 >>>>>>> Bandwidth (MB/sec): 7.13131 >>>>>>> Stddev Bandwidth: 9.78725 >>>>>>> Max bandwidth (MB/sec): 29.168 >>>>>>> Min bandwidth (MB/sec): 0 >>>>>>> Average IOPS: 1825 >>>>>>> Stddev IOPS: 2505 >>>>>>> Max IOPS: 7467 >>>>>>> Min IOPS: 0 >>>>>>> Average Latency(s): 0.0173691 >>>>>>> Stddev Latency(s): 0.21634 >>>>>>> Max latency(s): 6.71283 >>>>>>> Min latency(s): 0.00107473 >>>>>>> Cleaning up (deleting benchmark objects) >>>>>>> Clean up completed and total clean up time :16.269393 >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Oct 20, 2017 at 1:35 PM, Russell Glaue <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On the machine in question, the 2nd newest, we are using the LSI >>>>>>> MegaRAID >>>>>>> SAS-3 3008 [Fury], which allows us a "Non-RAID" option, and has no >>>>>>> battery. >>>>>>> The older two use the LSI MegaRAID SAS 2208 [Thunderbolt] I reported >>>>>>> earlier, each single drive configured as RAID0. >>>>>>> >>>>>>> Thanks for everyone's help. >>>>>>> I am going to run a 32 thread bench test after taking the 2nd >>>>>>> machine out >>>>>>> of the cluster with noout. >>>>>>> After it is out of the cluster, I am expecting the slow write issue >>>>>>> will >>>>>>> not surface. >>>>>>> >>>>>>> >>>>>>> On Fri, Oct 20, 2017 at 5:27 AM, David Turner <[email protected] >>>>>>> > >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> I can attest that the battery in the raid controller is a thing. I'm >>>>>>> used to using lsi controllers, but my current position has hp raid >>>>>>> controllers and we just tracked down 10 of our nodes that had >100ms >>>>>>> await >>>>>>> pretty much always were the only 10 nodes in the cluster with failed >>>>>>> batteries on the raid controllers. >>>>>>> >>>>>>> >>>>>>> On Thu, Oct 19, 2017, 8:15 PM Christian Balzer <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> On Thu, 19 Oct 2017 17:14:17 -0500 Russell Glaue wrote: >>>>>>> >>>>>>> That is a good idea. >>>>>>> However, a previous rebalancing processes has brought performance of >>>>>>> our >>>>>>> Guest VMs to a slow drag. >>>>>>> >>>>>>> >>>>>>> Never mind that I'm not sure that these SSDs are particular well >>>>>>> suited >>>>>>> for Ceph, your problem is clearly located on that one node. >>>>>>> >>>>>>> Not that I think it's the case, but make sure your PG distribution is >>>>>>> not >>>>>>> skewed with many more PGs per OSD on that node. >>>>>>> >>>>>>> Once you rule that out my first guess is the RAID controller, you're >>>>>>> running the SSDs are single RAID0s I presume? >>>>>>> If so a either configuration difference or a failed BBU on the >>>>>>> controller >>>>>>> could result in the writeback cache being disabled, which would >>>>>>> explain >>>>>>> things beautifully. >>>>>>> >>>>>>> As for a temporary test/fix (with reduced redundancy of course), set >>>>>>> noout >>>>>>> (or mon_osd_down_out_subtree_limit accordingly) and turn the slow >>>>>>> host >>>>>>> off. >>>>>>> >>>>>>> This should result in much better performance than you have now and >>>>>>> of >>>>>>> course be the final confirmation of that host being the culprit. >>>>>>> >>>>>>> Christian >>>>>>> >>>>>>> >>>>>>> On Thu, Oct 19, 2017 at 3:55 PM, Jean-Charles Lopez >>>>>>> <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> Hi Russell, >>>>>>> >>>>>>> as you have 4 servers, assuming you are not doing EC pools, just >>>>>>> stop all >>>>>>> the OSDs on the second questionable server, mark the OSDs on that >>>>>>> server as >>>>>>> out, let the cluster rebalance and when all PGs are active+clean >>>>>>> just >>>>>>> replay the test. >>>>>>> >>>>>>> All IOs should then go only to the other 3 servers. >>>>>>> >>>>>>> JC >>>>>>> >>>>>>> On Oct 19, 2017, at 13:49, Russell Glaue <[email protected]> wrote: >>>>>>> >>>>>>> No, I have not ruled out the disk controller and backplane making >>>>>>> the >>>>>>> disks slower. >>>>>>> Is there a way I could test that theory, other than swapping out >>>>>>> hardware? >>>>>>> -RG >>>>>>> >>>>>>> On Thu, Oct 19, 2017 at 3:44 PM, David Turner >>>>>>> <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> Have you ruled out the disk controller and backplane in the server >>>>>>> running slower? >>>>>>> >>>>>>> On Thu, Oct 19, 2017 at 4:42 PM Russell Glaue <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> I ran the test on the Ceph pool, and ran atop on all 4 storage >>>>>>> servers, >>>>>>> as suggested. >>>>>>> >>>>>>> Out of the 4 servers: >>>>>>> 3 of them performed with 17% to 30% disk %busy, and 11% CPU wait. >>>>>>> Momentarily spiking up to 50% on one server, and 80% on another >>>>>>> The 2nd newest server was almost averaging 90% disk %busy and >>>>>>> 150% CPU >>>>>>> wait. And more than momentarily spiking to 101% disk busy and >>>>>>> 250% CPU wait. >>>>>>> For this 2nd newest server, this was the statistics for about 8 >>>>>>> of 9 >>>>>>> disks, with the 9th disk not far behind the others. >>>>>>> >>>>>>> I cannot believe all 9 disks are bad >>>>>>> They are the same disks as the newest 1st server, >>>>>>> Crucial_CT960M500SSD1, >>>>>>> and same exact server hardware too. >>>>>>> They were purchased at the same time in the same purchase order >>>>>>> and >>>>>>> arrived at the same time. >>>>>>> So I cannot believe I just happened to put 9 bad disks in one >>>>>>> server, >>>>>>> and 9 good ones in the other. >>>>>>> >>>>>>> I know I have Ceph configured exactly the same on all servers >>>>>>> And I am sure I have the hardware settings configured exactly the >>>>>>> same >>>>>>> on the 1st and 2nd servers. >>>>>>> So if I were someone else, I would say it maybe is bad hardware >>>>>>> on the >>>>>>> 2nd server. >>>>>>> But the 2nd server is running very well without any hint of a >>>>>>> problem. >>>>>>> >>>>>>> Any other ideas or suggestions? >>>>>>> >>>>>>> -RG >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 18, 2017 at 3:40 PM, Maged Mokhtar >>>>>>> <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> just run the same 32 threaded rados test as you did before and >>>>>>> this >>>>>>> time run atop while the test is running looking for %busy of >>>>>>> cpu/disks. It >>>>>>> should give an idea if there is a bottleneck in them. >>>>>>> >>>>>>> On 2017-10-18 21:35, Russell Glaue wrote: >>>>>>> >>>>>>> I cannot run the write test reviewed at the >>>>>>> ceph-how-to-test-if-your-s >>>>>>> sd-is-suitable-as-a-journal-device blog. The tests write >>>>>>> directly to >>>>>>> the raw disk device. >>>>>>> Reading an infile (created with urandom) on one SSD, writing the >>>>>>> outfile to another osd, yields about 17MB/s. >>>>>>> But Isn't this write speed limited by the speed in which in the >>>>>>> dd >>>>>>> infile can be read? >>>>>>> And I assume the best test should be run with no other load. >>>>>>> >>>>>>> How does one run the rados bench "as stress"? >>>>>>> >>>>>>> -RG >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 18, 2017 at 1:33 PM, Maged Mokhtar >>>>>>> <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> measuring resource load as outlined earlier will show if the >>>>>>> drives >>>>>>> are performing well or not. Also how many osds do you have ? >>>>>>> >>>>>>> On 2017-10-18 19:26, Russell Glaue wrote: >>>>>>> >>>>>>> The SSD drives are Crucial M500 >>>>>>> A Ceph user did some benchmarks and found it had good >>>>>>> performance >>>>>>> https://forum.proxmox.com/threads/ceph-bad-performance-in- >>>>>>> qemu-guests.21551/ >>>>>>> >>>>>>> However, a user comment from 3 years ago on the blog post you >>>>>>> linked >>>>>>> to says to avoid the Crucial M500 >>>>>>> >>>>>>> Yet, this performance posting tells that the Crucial M500 is >>>>>>> good. >>>>>>> https://inside.servers.com/ssd-performance-2017-c4307a92dea >>>>>>> >>>>>>> On Wed, Oct 18, 2017 at 11:53 AM, Maged Mokhtar >>>>>>> <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> Check out the following link: some SSDs perform bad in Ceph >>>>>>> due to >>>>>>> sync writes to journal >>>>>>> >>>>>>> https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-tes >>>>>>> t-if-your-ssd-is-suitable-as-a-journal-device/ >>>>>>> >>>>>>> Anther thing that can help is to re-run the rados 32 threads >>>>>>> as >>>>>>> stress and view resource usage using atop (or collectl/sar) to >>>>>>> check for >>>>>>> %busy cpu and %busy disks to give you an idea of what is >>>>>>> holding down your >>>>>>> cluster..for example: if cpu/disk % are all low then check >>>>>>> your >>>>>>> network/switches. If disk %busy is high (90%) for all disks >>>>>>> then your >>>>>>> disks are the bottleneck: which either means you have SSDs >>>>>>> that are not >>>>>>> suitable for Ceph or you have too few disks (which i doubt is >>>>>>> the case). If >>>>>>> only 1 disk %busy is high, there may be something wrong with >>>>>>> this disk >>>>>>> should be removed. >>>>>>> >>>>>>> Maged >>>>>>> >>>>>>> On 2017-10-18 18:13, Russell Glaue wrote: >>>>>>> >>>>>>> In my previous post, in one of my points I was wondering if >>>>>>> the >>>>>>> request size would increase if I enabled jumbo packets. >>>>>>> currently it is >>>>>>> disabled. >>>>>>> >>>>>>> @jdillama: The qemu settings for both these two guest >>>>>>> machines, with >>>>>>> RAID/LVM and Ceph/rbd images, are the same. I am not thinking >>>>>>> that changing >>>>>>> the qemu settings of "min_io_size=<limited to >>>>>>> 16bits>,opt_io_size=<RBD >>>>>>> image object size>" will directly address the issue. >>>>>>> >>>>>>> @mmokhtar: Ok. So you suggest the request size is the result >>>>>>> of the >>>>>>> problem and not the cause of the problem. meaning I should go >>>>>>> after a >>>>>>> different issue. >>>>>>> >>>>>>> I have been trying to get write speeds up to what people on >>>>>>> this mail >>>>>>> list are discussing. >>>>>>> It seems that for our configuration, as it matches others, we >>>>>>> should >>>>>>> be getting about 70MB/s write speed. >>>>>>> But we are not getting that. >>>>>>> Single writes to disk are lucky to get 5MB/s to 6MB/s, but are >>>>>>> typically 1MB/s to 2MB/s. >>>>>>> Monitoring the entire Ceph cluster (using >>>>>>> http://cephdash.crapworks.de/), I have seen very rare >>>>>>> momentary >>>>>>> spikes up to 30MB/s. >>>>>>> >>>>>>> My storage network is connected via a 10Gb switch >>>>>>> I have 4 storage servers with a LSI Logic MegaRAID SAS 2208 >>>>>>> controller >>>>>>> Each storage server has 9 1TB SSD drives, each drive as 1 osd >>>>>>> (no >>>>>>> RAID) >>>>>>> Each drive is one LVM group, with two volumes - one volume for >>>>>>> the >>>>>>> osd, one volume for the journal >>>>>>> Each osd is formatted with xfs >>>>>>> The crush map is simple: default->rack->[host[1..4]->osd] with >>>>>>> an >>>>>>> evenly distributed weight >>>>>>> The redundancy is triple replication >>>>>>> >>>>>>> While I have read comments that having the osd and journal on >>>>>>> the >>>>>>> same disk decreases write speed, I have also read that once >>>>>>> past 8 OSDs per >>>>>>> node this is the recommended configuration, however this is >>>>>>> also the reason >>>>>>> why SSD drives are used exclusively for OSDs in the storage >>>>>>> nodes. >>>>>>> None-the-less, I was still expecting write speeds to be above >>>>>>> 30MB/s, >>>>>>> not below 6MB/s. >>>>>>> Even at 12x slower than the RAID, using my previously posted >>>>>>> iostat >>>>>>> data set, I should be seeing write speeds that average 10MB/s, >>>>>>> not 2MB/s. >>>>>>> >>>>>>> In regards to the rados benchmark tests you asked me to run, >>>>>>> here is >>>>>>> the output: >>>>>>> >>>>>>> [centos7]# rados bench -p scbench -b 4096 30 write -t 1 >>>>>>> Maintaining 1 concurrent writes of 4096 bytes to objects of >>>>>>> size 4096 >>>>>>> for up to 30 seconds or 0 objects >>>>>>> Object prefix: benchmark_data_hamms.sys.cu.cait.org_85049 >>>>>>> sec Cur ops started finished avg MB/s cur MB/s last >>>>>>> lat(s) >>>>>>> avg lat(s) >>>>>>> 0 0 0 0 0 0 >>>>>>> - >>>>>>> 0 >>>>>>> 1 1 201 200 0.78356 0.78125 >>>>>>> 0.00522307 >>>>>>> 0.00496574 >>>>>>> 2 1 469 468 0.915303 1.04688 >>>>>>> 0.00437497 >>>>>>> 0.00426141 >>>>>>> 3 1 741 740 0.964371 1.0625 >>>>>>> 0.00512853 >>>>>>> 0.0040434 >>>>>>> 4 1 888 887 0.866739 0.574219 >>>>>>> 0.00307699 >>>>>>> 0.00450177 >>>>>>> 5 1 1147 1146 0.895725 1.01172 >>>>>>> 0.00376454 >>>>>>> 0.0043559 >>>>>>> 6 1 1325 1324 0.862293 0.695312 >>>>>>> 0.00459443 >>>>>>> 0.004525 >>>>>>> 7 1 1494 1493 0.83339 0.660156 >>>>>>> 0.00461002 >>>>>>> 0.00458452 >>>>>>> 8 1 1736 1735 0.847369 0.945312 >>>>>>> 0.00253971 >>>>>>> 0.00460458 >>>>>>> 9 1 1998 1997 0.866922 1.02344 >>>>>>> 0.00236573 >>>>>>> 0.00450172 >>>>>>> 10 1 2260 2259 0.882563 1.02344 >>>>>>> 0.00262179 >>>>>>> 0.00442152 >>>>>>> 11 1 2526 2525 0.896775 1.03906 >>>>>>> 0.00336914 >>>>>>> 0.00435092 >>>>>>> 12 1 2760 2759 0.898203 0.914062 >>>>>>> 0.00351827 >>>>>>> 0.00434491 >>>>>>> 13 1 3016 3015 0.906025 1 >>>>>>> 0.00335703 >>>>>>> 0.00430691 >>>>>>> 14 1 3257 3256 0.908545 0.941406 >>>>>>> 0.00332344 >>>>>>> 0.00429495 >>>>>>> 15 1 3490 3489 0.908644 0.910156 >>>>>>> 0.00318815 >>>>>>> 0.00426387 >>>>>>> 16 1 3728 3727 0.909952 0.929688 >>>>>>> 0.0032881 >>>>>>> 0.00428895 >>>>>>> 17 1 3986 3985 0.915703 1.00781 >>>>>>> 0.00274809 >>>>>>> 0.0042614 >>>>>>> 18 1 4250 4249 0.922116 1.03125 >>>>>>> 0.00287411 >>>>>>> 0.00423214 >>>>>>> 19 1 4505 4504 0.926003 0.996094 >>>>>>> 0.00375435 >>>>>>> 0.00421442 >>>>>>> 2017-10-18 10:56:31.267173 min lat: 0.00181259 max lat: >>>>>>> 0.270553 avg >>>>>>> lat: 0.00420118 >>>>>>> sec Cur ops started finished avg MB/s cur MB/s last >>>>>>> lat(s) >>>>>>> avg lat(s) >>>>>>> 20 1 4757 4756 0.928915 0.984375 >>>>>>> 0.00463972 >>>>>>> 0.00420118 >>>>>>> 21 1 5009 5008 0.93155 0.984375 >>>>>>> 0.00360065 >>>>>>> 0.00418937 >>>>>>> 22 1 5235 5234 0.929329 0.882812 >>>>>>> 0.00626214 >>>>>>> 0.004199 >>>>>>> 23 1 5500 5499 0.933925 1.03516 >>>>>>> 0.00466584 >>>>>>> 0.00417836 >>>>>>> 24 1 5708 5707 0.928861 0.8125 >>>>>>> 0.00285727 >>>>>>> 0.00420146 >>>>>>> 25 0 5964 5964 0.931858 1.00391 >>>>>>> 0.00417383 >>>>>>> 0.0041881 >>>>>>> 26 1 6216 6215 0.933722 0.980469 >>>>>>> 0.0041009 >>>>>>> 0.00417915 >>>>>>> 27 1 6481 6480 0.937474 1.03516 >>>>>>> 0.00307484 >>>>>>> 0.00416118 >>>>>>> 28 1 6745 6744 0.940819 1.03125 >>>>>>> 0.00266329 >>>>>>> 0.00414777 >>>>>>> 29 1 7003 7002 0.943124 1.00781 >>>>>>> 0.00305905 >>>>>>> 0.00413758 >>>>>>> 30 1 7271 7270 0.946578 1.04688 >>>>>>> 0.00391017 >>>>>>> 0.00412238 >>>>>>> Total time run: 30.006060 >>>>>>> Total writes made: 7272 >>>>>>> Write size: 4096 >>>>>>> Object size: 4096 >>>>>>> Bandwidth (MB/sec): 0.946684 >>>>>>> Stddev Bandwidth: 0.123762 >>>>>>> Max bandwidth (MB/sec): 1.0625 >>>>>>> Min bandwidth (MB/sec): 0.574219 >>>>>>> Average IOPS: 242 >>>>>>> Stddev IOPS: 31 >>>>>>> Max IOPS: 272 >>>>>>> Min IOPS: 147 >>>>>>> Average Latency(s): 0.00412247 >>>>>>> Stddev Latency(s): 0.00648437 >>>>>>> Max latency(s): 0.270553 >>>>>>> Min latency(s): 0.00175318 >>>>>>> Cleaning up (deleting benchmark objects) >>>>>>> Clean up completed and total clean up time :29.069423 >>>>>>> >>>>>>> [centos7]# rados bench -p scbench -b 4096 30 write -t 32 >>>>>>> Maintaining 32 concurrent writes of 4096 bytes to objects of >>>>>>> size >>>>>>> 4096 for up to 30 seconds or 0 objects >>>>>>> Object prefix: benchmark_data_hamms.sys.cu.cait.org_86076 >>>>>>> sec Cur ops started finished avg MB/s cur MB/s last >>>>>>> lat(s) >>>>>>> avg lat(s) >>>>>>> 0 0 0 0 0 0 >>>>>>> - >>>>>>> 0 >>>>>>> 1 32 3013 2981 11.6438 11.6445 >>>>>>> 0.00247906 >>>>>>> 0.00572026 >>>>>>> 2 32 5349 5317 10.3834 9.125 >>>>>>> 0.00246662 >>>>>>> 0.00932016 >>>>>>> 3 32 5707 5675 7.3883 1.39844 >>>>>>> 0.00389774 >>>>>>> 0.0156726 >>>>>>> 4 32 5895 5863 5.72481 0.734375 >>>>>>> 1.13137 >>>>>>> 0.0167946 >>>>>>> 5 32 6869 6837 5.34068 3.80469 >>>>>>> 0.0027652 >>>>>>> 0.0226577 >>>>>>> 6 32 8901 8869 5.77306 7.9375 >>>>>>> 0.0053211 >>>>>>> 0.0216259 >>>>>>> 7 32 10800 10768 6.00785 7.41797 >>>>>>> 0.00358187 >>>>>>> 0.0207418 >>>>>>> 8 32 11825 11793 5.75728 4.00391 >>>>>>> 0.00217575 >>>>>>> 0.0215494 >>>>>>> 9 32 12941 12909 5.6019 4.35938 >>>>>>> 0.00278512 >>>>>>> 0.0220567 >>>>>>> 10 32 13317 13285 5.18849 1.46875 >>>>>>> 0.0034973 >>>>>>> 0.0240665 >>>>>>> 11 32 16189 16157 5.73653 11.2188 >>>>>>> 0.00255841 >>>>>>> 0.0212708 >>>>>>> 12 32 16749 16717 5.44077 2.1875 >>>>>>> 0.00330334 >>>>>>> 0.0215915 >>>>>>> 13 32 16756 16724 5.02436 0.0273438 >>>>>>> 0.00338994 >>>>>>> 0.021849 >>>>>>> 14 32 17908 17876 4.98686 4.5 >>>>>>> 0.00402598 >>>>>>> 0.0244568 >>>>>>> 15 32 17936 17904 4.66171 0.109375 >>>>>>> 0.00375799 >>>>>>> 0.0245545 >>>>>>> 16 32 18279 18247 4.45409 1.33984 >>>>>>> 0.00483873 >>>>>>> 0.0267929 >>>>>>> 17 32 18372 18340 4.21346 0.363281 >>>>>>> 0.00505187 >>>>>>> 0.0275887 >>>>>>> 18 32 19403 19371 4.20309 4.02734 >>>>>>> 0.00545154 >>>>>>> 0.029348 >>>>>>> 19 31 19845 19814 4.07295 1.73047 >>>>>>> 0.00254726 >>>>>>> 0.0306775 >>>>>>> 2017-10-18 10:57:58.160536 min lat: 0.0015005 max lat: 2.27707 >>>>>>> avg >>>>>>> lat: 0.0307559 >>>>>>> sec Cur ops started finished avg MB/s cur MB/s last >>>>>>> lat(s) >>>>>>> avg lat(s) >>>>>>> 20 31 20401 20370 3.97788 2.17188 >>>>>>> 0.00307238 >>>>>>> 0.0307559 >>>>>>> 21 32 21338 21306 3.96254 3.65625 >>>>>>> 0.00464563 >>>>>>> 0.0312288 >>>>>>> 22 32 23057 23025 4.0876 6.71484 >>>>>>> 0.00296295 >>>>>>> 0.0299267 >>>>>>> 23 32 23057 23025 3.90988 0 >>>>>>> - >>>>>>> 0.0299267 >>>>>>> 24 32 23803 23771 3.86837 1.45703 >>>>>>> 0.00301471 >>>>>>> 0.0312804 >>>>>>> 25 32 24112 24080 3.76191 1.20703 >>>>>>> 0.00191063 >>>>>>> 0.0331462 >>>>>>> 26 31 25303 25272 3.79629 4.65625 >>>>>>> 0.00794399 >>>>>>> 0.0329129 >>>>>>> 27 32 28803 28771 4.16183 13.668 >>>>>>> 0.0109817 >>>>>>> 0.0297469 >>>>>>> 28 32 29592 29560 4.12325 3.08203 >>>>>>> 0.00188185 >>>>>>> 0.0301911 >>>>>>> 29 32 30595 30563 4.11616 3.91797 >>>>>>> 0.00379099 >>>>>>> 0.0296794 >>>>>>> 30 32 31031 30999 4.03572 1.70312 >>>>>>> 0.00283347 >>>>>>> 0.0302411 >>>>>>> Total time run: 30.822350 >>>>>>> Total writes made: 31032 >>>>>>> Write size: 4096 >>>>>>> Object size: 4096 >>>>>>> Bandwidth (MB/sec): 3.93282 >>>>>>> Stddev Bandwidth: 3.66265 >>>>>>> Max bandwidth (MB/sec): 13.668 >>>>>>> Min bandwidth (MB/sec): 0 >>>>>>> Average IOPS: 1006 >>>>>>> Stddev IOPS: 937 >>>>>>> Max IOPS: 3499 >>>>>>> Min IOPS: 0 >>>>>>> Average Latency(s): 0.0317779 >>>>>>> Stddev Latency(s): 0.164076 >>>>>>> Max latency(s): 2.27707 >>>>>>> Min latency(s): 0.0013848 >>>>>>> Cleaning up (deleting benchmark objects) >>>>>>> Clean up completed and total clean up time :20.166559 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 18, 2017 at 8:51 AM, Maged Mokhtar >>>>>>> <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> First a general comment: local RAID will be faster than Ceph >>>>>>> for a >>>>>>> single threaded (queue depth=1) io operation test. A single >>>>>>> thread Ceph >>>>>>> client will see at best same disk speed for reads and for >>>>>>> writes 4-6 times >>>>>>> slower than single disk. Not to mention the latency of local >>>>>>> disks will >>>>>>> much better. Where Ceph shines is when you have many >>>>>>> concurrent ios, it >>>>>>> scales whereas RAID will decrease speed per client as you add >>>>>>> more. >>>>>>> >>>>>>> Having said that, i would recommend running rados/rbd >>>>>>> bench-write >>>>>>> and measure 4k iops at 1 and 32 threads to get a better idea >>>>>>> of how your >>>>>>> cluster performs: >>>>>>> >>>>>>> ceph osd pool create testpool 256 256 >>>>>>> rados bench -p testpool -b 4096 30 write -t 1 >>>>>>> rados bench -p testpool -b 4096 30 write -t 32 >>>>>>> ceph osd pool delete testpool testpool >>>>>>> --yes-i-really-really-mean-it >>>>>>> >>>>>>> rbd bench-write test-image --io-threads=1 --io-size 4096 >>>>>>> --io-pattern rand --rbd_cache=false >>>>>>> rbd bench-write test-image --io-threads=32 --io-size 4096 >>>>>>> --io-pattern rand --rbd_cache=false >>>>>>> >>>>>>> I think the request size difference you see is due to the io >>>>>>> scheduler in the case of local disks having more ios to >>>>>>> re-group so has a >>>>>>> better chance in generating larger requests. Depending on >>>>>>> your kernel, the >>>>>>> io scheduler may be different for rbd (blq-mq) vs sdx (cfq) >>>>>>> but again i >>>>>>> would think the request size is a result not a cause. >>>>>>> >>>>>>> Maged >>>>>>> >>>>>>> On 2017-10-17 23:12, Russell Glaue wrote: >>>>>>> >>>>>>> I am running ceph jewel on 5 nodes with SSD OSDs. >>>>>>> I have an LVM image on a local RAID of spinning disks. >>>>>>> I have an RBD image on in a pool of SSD disks. >>>>>>> Both disks are used to run an almost identical CentOS 7 >>>>>>> system. >>>>>>> Both systems were installed with the same kickstart, though >>>>>>> the disk >>>>>>> partitioning is different. >>>>>>> >>>>>>> I want to make writes on the the ceph image faster. For >>>>>>> example, >>>>>>> lots of writes to MySQL (via MySQL replication) on a ceph SSD >>>>>>> image are >>>>>>> about 10x slower than on a spindle RAID disk image. The MySQL >>>>>>> server on >>>>>>> ceph rbd image has a hard time keeping up in replication. >>>>>>> >>>>>>> So I wanted to test writes on these two systems >>>>>>> I have a 10GB compressed (gzip) file on both servers. >>>>>>> I simply gunzip the file on both systems, while running >>>>>>> iostat. >>>>>>> >>>>>>> The primary difference I see in the results is the average >>>>>>> size of >>>>>>> the request to the disk. >>>>>>> CentOS7-lvm-raid-sata writes a lot faster to disk, and the >>>>>>> size of >>>>>>> the request is about 40x, but the number of writes per second >>>>>>> is about the >>>>>>> same >>>>>>> This makes me want to conclude that the smaller size of the >>>>>>> request >>>>>>> for CentOS7-ceph-rbd-ssd system is the cause of it being >>>>>>> slow. >>>>>>> >>>>>>> >>>>>>> How can I make the size of the request larger for ceph rbd >>>>>>> images, >>>>>>> so I can increase the write throughput? >>>>>>> Would this be related to having jumbo packets enabled in my >>>>>>> ceph >>>>>>> storage network? >>>>>>> >>>>>>> >>>>>>> Here is a sample of the results: >>>>>>> >>>>>>> [CentOS7-lvm-raid-sata] >>>>>>> $ gunzip large10gFile.gz & >>>>>>> $ iostat -x vg_root-lv_var -d 5 -m -N >>>>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s >>>>>>> wMB/s >>>>>>> avgrq-sz avgqu-sz await r_await w_await svctm %util >>>>>>> ... >>>>>>> vg_root-lv_var 0.00 0.00 30.60 452.20 13.60 >>>>>>> 222.15 >>>>>>> 1000.04 8.69 14.05 0.99 14.93 2.07 100.04 >>>>>>> vg_root-lv_var 0.00 0.00 88.20 182.00 39.20 >>>>>>> 89.43 >>>>>>> 974.95 4.65 9.82 0.99 14.10 3.70 100.00 >>>>>>> vg_root-lv_var 0.00 0.00 75.45 278.24 33.53 >>>>>>> 136.70 >>>>>>> 985.73 4.36 33.26 1.34 41.91 0.59 20.84 >>>>>>> vg_root-lv_var 0.00 0.00 111.60 181.80 49.60 >>>>>>> 89.34 >>>>>>> 969.84 2.60 8.87 0.81 13.81 0.13 3.90 >>>>>>> vg_root-lv_var 0.00 0.00 68.40 109.60 30.40 >>>>>>> 53.63 >>>>>>> 966.87 1.51 8.46 0.84 13.22 0.80 14.16 >>>>>>> ... >>>>>>> >>>>>>> [CentOS7-ceph-rbd-ssd] >>>>>>> $ gunzip large10gFile.gz & >>>>>>> $ iostat -x vg_root-lv_data -d 5 -m -N >>>>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s >>>>>>> wMB/s >>>>>>> avgrq-sz avgqu-sz await r_await w_await svctm %util >>>>>>> ... >>>>>>> vg_root-lv_data 0.00 0.00 46.40 167.80 0.88 >>>>>>> 1.46 >>>>>>> 22.36 1.23 5.66 2.47 6.54 4.52 96.82 >>>>>>> vg_root-lv_data 0.00 0.00 16.60 55.20 0.36 >>>>>>> 0.14 >>>>>>> 14.44 0.99 13.91 9.12 15.36 13.71 98.46 >>>>>>> vg_root-lv_data 0.00 0.00 69.00 173.80 1.34 >>>>>>> 1.32 >>>>>>> 22.48 1.25 5.19 3.77 5.75 3.94 95.68 >>>>>>> vg_root-lv_data 0.00 0.00 74.40 293.40 1.37 >>>>>>> 1.47 >>>>>>> 15.83 1.22 3.31 2.06 3.63 2.54 93.26 >>>>>>> vg_root-lv_data 0.00 0.00 90.80 359.00 1.96 >>>>>>> 3.41 >>>>>>> 24.45 1.63 3.63 1.94 4.05 2.10 94.38 >>>>>>> ... >>>>>>> >>>>>>> [iostat key] >>>>>>> w/s == The number (after merges) of write requests completed >>>>>>> per >>>>>>> second for the device. >>>>>>> wMB/s == The number of sectors (kilobytes, megabytes) written >>>>>>> to the >>>>>>> device per second. >>>>>>> avgrq-sz == The average size (in kilobytes) of the requests >>>>>>> that >>>>>>> were issued to the device. >>>>>>> avgqu-sz == The average queue length of the requests that >>>>>>> were >>>>>>> issued to the device. >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> [email protected] >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> [email protected] >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> [email protected] >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Christian Balzer Network/Systems Engineer >>>>>>> [email protected] Rakuten Communications >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> [email protected] >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> [email protected] >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> [email protected] >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> [email protected] >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> [email protected] >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Brian Andrus | Cloud Systems Engineer | DreamHost >>>>> [email protected] | www.dreamhost.com >>>>> >>>>> >>>> >>>> >>>> -- >>>> Brian Andrus | Cloud Systems Engineer | DreamHost >>>> [email protected] | www.dreamhost.com >>>> >>> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
