Re: [ceph-users] Poor performance on all SSD cluster

Mark Kirkwood Mon, 23 Jun 2014 00:17:28 -0700

On 23/06/14 18:51, Christian Balzer wrote:

On Sunday, June 22, 2014, Mark Kirkwood <[email protected]>
rbd cache max dirty = 1073741824
rbd cache max dirty age = 100


Mark, you're giving it a 2GB cache.
For a write test that's 1GB in size.
"Aggressively set" is a bit of an understatement here. ^o^
Most people will not want to spend this much memory on write-only caching.

Of course with these settings that test will yield impressive results.

However if you'd observe your storage nodes, OSDs, you will see that this
is still going to take the same time until it is actually, finally written
to disk. Same with using kernelspace RBD and caching enabled in the VM.
Doing similar tests with fio I managed to fill the cache and got fantastic
IOPS but then it took minutes to finally clean out.

Resulting in hung task warnings for the jbd process(es) like this:
---
May 28 16:58:56 tvm-03 kernel: [  960.320182] INFO: task jbd2/vda1-8:153 blocked
  for more than 120 seconds.
May 28 16:58:56 tvm-03 kernel: [  960.320866] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
---

Now this doesn't actively break things AFAICT, but it left me feeling
quite uncomfortable nevertheless.

Also what happens if something "bad" happens to the VM or it's host before
the cache is drained?

 From where I'm standing the RBD cache is fine for merging really small
writes and that's it.

Yes! And thank you Christian for writing (something very similar to)what I was about to write in response to Greg's question!

For database types (and yes I'm one of those)...you want to know thatyour writes (particularly your commit writes) are actually making it topersistent storage (that ACID thing you know). Now I see RBD cache verylike battery backed RAID cards - your commits (i.e fsync or O_DIRECTwrites) are not actually written, but are cached - so you are dependingon the reliability of a) your RAID controller battery etc in that caseor more interestingly b) your Ceph topology - to withstand nodefailures. Given we usually design a Ceph cluster with these things inmind it is probably ok [1]!


Regards

Mark

[1] Obviously my setup in use here - 2 ods, 2 SATA and 2 SSD all on thesame host is merely a play/benchmark config and it *not* a topologydesigned with reliability in mind!

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Poor performance on all SSD cluster

Reply via email to