Hi Christian, Thank you for the follow-up on this. I answered those questions inline below. Have a good day, Lewis George
---------------------------------------- From: "Christian Balzer" <[email protected]> Sent: Thursday, August 18, 2016 6:31 PM To: [email protected] Cc: "[email protected]" <[email protected]> Subject: Re: [ceph-users] Understanding write performance Hello, On Thu, 18 Aug 2016 12:03:36 -0700 [email protected] wrote: >> Hi, >> So, I have really been trying to find information about this without >> annoying the list, but I just can't seem to get any clear picture of it. I >> was going to try to search the mailing list archive, but it seems there is >> an error when trying to search it right now(posting below, and sending to >> listed address in error). >> >Google (as in all the various archives of this ML) works well for me, >as always the results depend on picking "good" search strings. > >> I have been working for a couple of months now(slowly) on testing out >> Ceph. I only have a small PoC setup. I have 6 hosts, but I am only using 3 >> of them in the cluster at the moment. They each have 6xSSDs(only 5 usable >> by Ceph), but the networks(1 public, 1 cluster) are only 1Gbps. I have the >> MONs running on the same 3 hosts, and I have an OSD process running for >> each of the 5 disks per host. The cluster shows in good health, with 15 >> OSDs. I have one pool there, the default rbd, which I setup with 512 PGs. >> >Exact SSD models, please. >Also CPU, though at 1GbE that isn't going to be your problem. #Lewis: Each SSD is of model: Model Family: Samsung based SSDs Device Model: Samsung SSD 840 PRO Series Each of the 3 nodes has 2 x Intel E5645, with 48GB of memory. >> I have create an rbd image on the pool, and I have it mapped and mounted >> on another client host. >Mapped via the kernel interface? # Lewis On the client node(which is same specs as the 3 others), I used the 'rbd map' command to map a 100GB rbd image to rbd0, then created an xfs FS on there, and mounted it. >>When doing write tests, like with 'dd', I am >> getting rather spotty performance. >Example dd command line please. #Lewis: I put those below. >>Not only is it up and down, but even >> when it is up, the performance isn't that great. On large'ish(4GB >> sequential) writes, it averages about 65MB/s, and on repeated smaller(40MB) >> sequential writes, it is jumping around between 20MB/s and 80MB/s. >> >Monitor your storage nodes during these test runs with atop (or iostat) >and see how busy your actual SSDs are then. >Also test with "rados bench" to get a base line. #Lewis: I have all the nodes instrumented with collectd. I am seeing each disk only writing at ~25MB/s during the write tests. I will check out the 'rados bench' command, as I have not checked it yet. >> However, with read tests, I am able to completely max out the network >> there, easily reaching 125MB/s. Tests on the disks directly are able to get >> up to 550MB/s reads and 350MB/s writes. So, I know it isn't a problem with >> the disks. >> >How did you test these speed, exact command line please. >There are SSDs that can write very fast with buffered I/O but are >abysmally slow with sync/direct I/O. >Which is what Ceph journals use. #Lewis: I have mostly been testing with just dd, though I have also tested using several fio tests too. With dd, I have tested writing 4GB files, with both 4k and 1M block sizes(get about the same results, on average). dd if=/dev/zero of=/mnt/set1/testfile700 bs=4k count=1000000 conv=fsync dd if=/dev/zero of=/mnt/set1/testfile700 bs=1M count=4000 conv=fsync >See the various threads in here and the "classic" link: >https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-i s-suitable-as-a-journal-device/ #Lewis: I have been reading over a lot of his articles. They are really good. I did not see that one. Thank you for pointing it out. >> I guess my question is, is there any additional optimizations or tuning I >> should review here. I have read over all the docs, but I don't know which, >> if any, of the values would need tweaking. Also, I am not sure if this is >> just how it is with Ceph, given the need to write multiple copies of each >> object. Is the slower write performance(averaging ~1/2 of the network >> throughput) to be expected? I haven't seen any clear answer on that in the >> docs or in articles I have found around. So, I am not sure if my >> expectation is just wrong. >> >While the replication incurs some performance penalties, this is mostly an >issue with small I/Os, not the type of large sequential writes you're >doing. >I'd expect a setup like yours to deliver more or less full line speed, if >your network and SSDs are working correctly. > >In my crappy test cluster with an identical network setup to yours, 4 >nodes with 4 crappy SATA disks each (so 16 OSDs), I can get better and >more consistent write speed than you, around 100MB/s. > >Christian > >> Anyway, some basic idea on those concepts or some pointers to some good >> docs or articles would be wonderful. Thank you! >> >> Lewis George >> >> >> > > >-- >Christian Balzer Network/Systems Engineer >[email protected] Global OnLine Japan/Rakuten Communications >http://www.gol.com/
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
