Hi John,

The 50/50 thing comes from the way the Ceph OSD writes data twice:
first to the journal, and then subsequently to the data partition.
The write doubling may not affect your performance outcome, depending
on the ratio of drive bandwidth to network bandwidth and the I/O
pattern.  In configurations where it is an issue, the way to improve
performance is to use an SSD for journals (Sebastian mentions this in
his article under "Commodity improved").

The journal is an area of quite some flexibility, the relevant
settings are in the docs here:
http://ceph.com/docs/master/rados/configuration/journal-ref/
http://ceph.com/docs/master/rados/configuration/osd-config-ref/#journal-settings

There is some discussion of the use of SSDs with Ceph here:
http://ceph.com/docs/master/start/hardware-recommendations/#solid-state-drives

I'm sure others on this list will have more empirical information
about their experiences in this area.

Cheers,
John

On Thu, Feb 6, 2014 at 6:18 PM, John Mancuso <jmanc...@freewheel.tv> wrote:
> Hey all, I'm currently pouring through the ceph docs trying to familiarize
> myself with the product before I begin my cluster build-out for a
> virtualized environment. One area which I've been looking into is disk
> throughput/performance.
>
>
>
> I stumbled onto the following site:
>
> http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/
>
>
>
> 1)       I'm not sure where this info below originates as I did not see this
> on the ceph doc site, unless it is hidden in some dark corner somewhere.
> Anyone point me to a wiki/url?
>
> 2)      Can someone describe this 50/50 split of journal vs filesystem
> (assume it has something to do with filestore flush)?
>
> "Consideration about the ceph's journal. The journal is by design the
> component that could be severely and easily improved. Take a little step
> back over it. As a reminder the ceph's journal serves 2 purposes:
>
> It acts as a buffer cache (FIFO buffer). The journal takes every request and
> performs each write with O_DIRECT. After a determined period and
> acknowledgment the journal flush his content to the backend filesystem. By
> default this value is set to 5 seconds and called filestore max sync
> interval. The filestore starts to flush when the journal is half-full or max
> sync interval is reached.
> Failure coverage, pending writes are handled by the Journal if not committed
> yet to the backend filesystem.
>
> The journal can operate in 2 modes called parallel and writeahead, the given
> mode is automatically detected according to the file system in use by the
> OSD backend storage. The parallel mode is only supported by Btrfs.
>
> In practice, common gigabits network can write 100 MB/sec. Let say that you
> store your journal and your backend storage are stored on the same disk.
> This disk has a write speed of 100 MB/sec. With the default writeahead mode
> the write speed will be split after 5 seconds (the default duration during
> the one the journal starts to flush to the backend filesystem).
>
> The first 5 sec writes at 100 MB/sec, after that writes are splitted like
> so:
>
> 50 MB/sec for the journal
> 50 MB/sec for the backend filesystem"
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to