On 07/16/2014 09:58 AM, Riccardo Murri wrote:
Hello,
I am new to Ceph; the group I'm working in is currently evaluating it
for our new large-scale storage.
Is there any recommendation for the OSD journals? E.g., does it make
sense to keep them on SSDs? Would it make sense to host the journal
on a RAID-1 array for added safety? (IOW: what happens if the journal
device fails and the journal is lost?)
Thanks for any explanation and suggestion!
Hi,
There are a couple of common configurations that make sense imho:
1) Leave journals on the same disks as the data (best to have them in
their own partition). This is a fairly safe option since the OSDs only
have a single disk they rely on (IE minimize potential failures). It
can be slow, but it depends on the controller you use and possibly the
IO scheduler. Often times a controller with writeback cache seems to
help avoid seek contention during writes, but you will currently lose
about half your disk throughput to journal writes during sequential
write IO.
2) Put journals on SSDs. In this scenario you want to match your per
journal SSD speed and disk speed. IE if you have an SSD that can do
400MB/s and disks that can do ~125MB/s of sequential writes, you
probably want to put somewhere around 3-5 journals on the SSD depending
on how much sequential write throughput matters to you. OSDs are now
dependant on both the spinning disk and the SSD not to fail, and one SSD
failure will take down multiple OSDs. You gain speed though and may not
need more expensive controllers with WB cache (though they may still be
useful to protect against power failure).
Some folks have used raid-1 LUNs for the journals and it works fine, but
I'm not really a fan of it, especially with SSDs. You are causing
double the writes to the SSDs, and SSDs tend to fail in clumps based on
the number of writes. If the choice is between 6 journals per SSD
RAID-1 or 3 journals per SSD JBOD, I'd choose the later. I'd want to
keep my overall OSD count high though to minimize the fallout from 3
OSDs going down at once.
Arguably if you do the RAID1, can swap failed SSDs quickly, and
anticipate that the remaining SSD is likely going to die soon after the
first, maybe the RAID1 is worth it. The disadvantages seem pretty steep
to me though.
Mark
Riccardo
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com