Hi Christian,
 Thank you for the follow-up on this. 
  
 I answered those questions inline below.
  
 Have a good day,
  
 Lewis George
  

----------------------------------------
 From: "Christian Balzer" <[email protected]>
Sent: Thursday, August 18, 2016 6:31 PM
To: [email protected]
Cc: "[email protected]" <[email protected]>
Subject: Re: [ceph-users] Understanding write performance   

Hello,

On Thu, 18 Aug 2016 12:03:36 -0700 [email protected] wrote:

>> Hi,
>> So, I have really been trying to find information about this without
>> annoying the list, but I just can't seem to get any clear picture of it. 
I
>> was going to try to search the mailing list archive, but it seems there 
is
>> an error when trying to search it right now(posting below, and sending 
to
>> listed address in error).
>>
>Google (as in all the various archives of this ML) works well for me,
>as always the results depend on picking "good" search strings.
>
>> I have been working for a couple of months now(slowly) on testing out
>> Ceph. I only have a small PoC setup. I have 6 hosts, but I am only using 
3
>> of them in the cluster at the moment. They each have 6xSSDs(only 5 
usable
>> by Ceph), but the networks(1 public, 1 cluster) are only 1Gbps. I have 
the
>> MONs running on the same 3 hosts, and I have an OSD process running for
>> each of the 5 disks per host. The cluster shows in good health, with 15
>> OSDs. I have one pool there, the default rbd, which I setup with 512 
PGs.
>>
>Exact SSD models, please.
>Also CPU, though at 1GbE that isn't going to be your problem.
  
 #Lewis: Each SSD is of model:
 Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 840 PRO Series
  
 Each of the 3 nodes has 2 x Intel E5645, with 48GB of memory.

>> I have create an rbd image on the pool, and I have it mapped and 
mounted
>> on another client host.
 >Mapped via the kernel interface?
  
 # Lewis On the client node(which is same specs as the 3 others), I used 
the 'rbd map' command to map a 100GB rbd image to rbd0, then created an xfs 
FS on there, and mounted it.

>>When doing write tests, like with 'dd', I am
>> getting rather spotty performance.
>Example dd command line please.
  
 #Lewis: I put those below.

>>Not only is it up and down, but even
>> when it is up, the performance isn't that great. On large'ish(4GB
>> sequential) writes, it averages about 65MB/s, and on repeated 
smaller(40MB)
>> sequential writes, it is jumping around between 20MB/s and 80MB/s.
>>
>Monitor your storage nodes during these test runs with atop (or iostat)
>and see how busy your actual SSDs are then.
>Also test with "rados bench" to get a base line.
  
 #Lewis: I have all the nodes instrumented with collectd. I am seeing each 
disk only writing at ~25MB/s during the write tests. I will check out the 
'rados bench' command, as I have not checked it yet.

>> However, with read tests, I am able to completely max out the network
>> there, easily reaching 125MB/s. Tests on the disks directly are able to 
get
>> up to 550MB/s reads and 350MB/s writes. So, I know it isn't a problem 
with
>> the disks.
>>
>How did you test these speed, exact command line please.
>There are SSDs that can write very fast with buffered I/O but are
>abysmally slow with sync/direct I/O.
>Which is what Ceph journals use.
  
 #Lewis: I have mostly been testing with just dd, though I have also tested 
using several fio tests too. With dd, I have tested writing 4GB files, with 
both 4k and 1M block sizes(get about the same results, on average).
  
 dd if=/dev/zero of=/mnt/set1/testfile700 bs=4k count=1000000 conv=fsync
 dd if=/dev/zero of=/mnt/set1/testfile700 bs=1M count=4000 conv=fsync

>See the various threads in here and the "classic" link:
>https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-i
s-suitable-as-a-journal-device/
  
 #Lewis: I have been reading over a lot of his articles. They are really 
good. I did not see that one. Thank you for pointing it out.

>> I guess my question is, is there any additional optimizations or tuning 
I
>> should review here. I have read over all the docs, but I don't know 
which,
>> if any, of the values would need tweaking. Also, I am not sure if this 
is
>> just how it is with Ceph, given the need to write multiple copies of 
each
>> object. Is the slower write performance(averaging ~1/2 of the network
>> throughput) to be expected? I haven't seen any clear answer on that in 
the
>> docs or in articles I have found around. So, I am not sure if my
>> expectation is just wrong.
>>
>While the replication incurs some performance penalties, this is mostly 
an
>issue with small I/Os, not the type of large sequential writes you're
>doing.
>I'd expect a setup like yours to deliver more or less full line speed, if
>your network and SSDs are working correctly.
>
>In my crappy test cluster with an identical network setup to yours, 4
>nodes with 4 crappy SATA disks each (so 16 OSDs), I can get better and
>more consistent write speed than you, around 100MB/s.
 >
>Christian
>
>> Anyway, some basic idea on those concepts or some pointers to some good
>> docs or articles would be wonderful. Thank you!
>>
>> Lewis George
>>
>>
>>
>
>
>--
>Christian Balzer Network/Systems Engineer
>[email protected] Global OnLine Japan/Rakuten Communications
>http://www.gol.com/
 


_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to