Hi,
My 0.02 :
> Secondly, I'm unclear about how OSDs use the journal. It appears they
write to the journal (in all cases, can't be turned
>off), ack to the client and then read the journal later to write to backing
>storage. Is that correct?
I would like to say NO, the journal will NEVER BE READ except recoverying(
replay the journal in that case).
There are two configurations named 'filestore journal parallel' and 'filestore
journal writeahead ",
with "journal parallel", the data will be write to both journal and OSD in
parallel, either journal or OSD finished the write, ceph will ack to the
client.This is ONLY for BTRFS ,since BTRFS has bulid-in mechanism which can
help to keep consistency/
With "journal writeahead",the data first write to journal ,ack to the client,
and write to OSD, note that, the data always keep in memory before it write to
both OSD and journal,so the write is directly from memory to OSDs. This mode
suite for XFS and EXT4.
The term "wirte to journal " means the data is physically write into journal,
but not for "write to OSD", ceph open the file in OSD withOUT O_DIRECT so the
write will goes to pagecache (kernel cache).
>On a similar note, I am using XFS on the OSDs which also journals, does this
>affect performance in any way?
Again ,NO, journal in XFS only journal File system related metadata, it never
journal the data extend, so you can not rely on the XFS journal.
> Can you share any information on the SSD you are using, is it PCIe
connected?
Depends, if you use HDD as your OSD data disk, a SATA/SAS SSD is enough
for you. Instead of Intel 520, I would like to suggest you use the Intel
DCS3700 since it provide better durability for write. Since a DCS3700 can
provide 400~500MB/s for write and HDD can only have ~100MB/s ,it's safe for a
DCS3700 to provide journal for 4~5 HDDs.
And , if you have some insight/assumption on your workload, say " I
don't care throughtput at all , all my workload doing random access". With such
assumption , you can have very high SSD:HDD ratio, 8:1 or even 10:1 will also
be fine
But if you want to use SSD as data disk, you may need to find something
really really fast to journal the SSD. High-end PCIE-SSD or NVRAM may be the
choice.
Xiaoxi
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Charles 'Boyo
Sent: Monday, July 22, 2013 5:04 AM
To: Mikaël Cluseau
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [ceph-users] SSD recommendations for OSD journals
Thank you for the information Mikael.
Counting on the kernel's cache, it appears I will be best served purchasing
write-optimized SSDs?
Can you share any information on the SSD you are using, is it PCIe connected?
Another question, since the intention of this storage cluster is relatively
cheap storage on commodity hardware, what's the balance between cheap SSDs and
reliability since journal failure might result in data loss or will such an
event just 'down' the affected OSDs? On a similar note, I am using XFS on the
OSDs which also journals, does this affect performance in any way?
Charles
On Sun, Jul 21, 2013 at 9:27 PM, Mikaël Cluseau
<[email protected]<mailto:[email protected]>> wrote:
Hi,
On 07/22/13 06:05, Charles 'Boyo wrote:
Secondly, I'm unclear about how OSDs use the journal. It appears they write to
the journal (in all cases, can't be turned off), ack to the client and then
read the journal later to write to backing storage. Is that correct?
Yes
I'm coming from enterprise ZFS with an SSD is also used for write journalling
but data flushes are from the disk cache in memory, hence the use of write
optimized SSDs. Why can't Ceph be configured to write from RAM instead of
reading the journal on flush?
>From my stats I can tell that the journal flushes use the kernel's cache and
>do not hit the SSD. Here, sdd is my journal SSD :
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com