On 23/06/14 18:51, Christian Balzer wrote:
On Sunday, June 22, 2014, Mark Kirkwood <[email protected]>
rbd cache max dirty = 1073741824
rbd cache max dirty age = 100
Mark, you're giving it a 2GB cache.
For a write test that's 1GB in size.
"Aggressively set" is a bit of an understatement here. ^o^
Most people will not want to spend this much memory on write-only caching.
Of course with these settings that test will yield impressive results.
However if you'd observe your storage nodes, OSDs, you will see that this
is still going to take the same time until it is actually, finally written
to disk. Same with using kernelspace RBD and caching enabled in the VM.
Doing similar tests with fio I managed to fill the cache and got fantastic
IOPS but then it took minutes to finally clean out.
Resulting in hung task warnings for the jbd process(es) like this:
---
May 28 16:58:56 tvm-03 kernel: [ 960.320182] INFO: task jbd2/vda1-8:153 blocked
for more than 120 seconds.
May 28 16:58:56 tvm-03 kernel: [ 960.320866] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
---
Now this doesn't actively break things AFAICT, but it left me feeling
quite uncomfortable nevertheless.
Also what happens if something "bad" happens to the VM or it's host before
the cache is drained?
From where I'm standing the RBD cache is fine for merging really small
writes and that's it.
Yes! And thank you Christian for writing (something very similar to)
what I was about to write in response to Greg's question!
For database types (and yes I'm one of those)...you want to know that
your writes (particularly your commit writes) are actually making it to
persistent storage (that ACID thing you know). Now I see RBD cache very
like battery backed RAID cards - your commits (i.e fsync or O_DIRECT
writes) are not actually written, but are cached - so you are depending
on the reliability of a) your RAID controller battery etc in that case
or more interestingly b) your Ceph topology - to withstand node
failures. Given we usually design a Ceph cluster with these things in
mind it is probably ok [1]!
Regards
Mark
[1] Obviously my setup in use here - 2 ods, 2 SATA and 2 SSD all on the
same host is merely a play/benchmark config and it *not* a topology
designed with reliability in mind!
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com