Re: [ceph-users] multiple journals on SSD

Zoltan Arnold Nagy Fri, 08 Jul 2016 00:55:43 -0700

Hi Christian,

On 08 Jul 2016, at 02:22, Christian Balzer <ch...@gol.com> wrote:

Hello,

On Thu, 7 Jul 2016 23:19:35 +0200 Zoltan Arnold Nagy wrote:

Hi Nick,

How large NVMe drives are you running per 12 disks?

In my current setup I have 4xP3700 per 36 disks but I feel like I could
get by with 2… Just looking for community experience :-)

This is funny, because you ask Nick about the size and don't mention it
yourself. ^o^

You are absolutely right, my bad. We are using the 400GB models.

As I speculated in my reply, it's the 400GB model and Nick didn't dispute
that.
And I shall assume the same for you.

You could get by with 2 of the 400GB ones, but that depends on a number of
things.

1. What's your use case, typical usage pattern?
Are you doing a lot of large sequential writes or is it mostly smallish
I/Os?
HDD OSDs will clock in at about 100MB/s with OSD bench, but realistically
not see more than 50-60MB/s, so with 18 of them per one 400GB P3700 you're
about on par.

Our usage varies so much that it’s hard to put a finger on it.
Some days it’s this, some days it’s that. Internal cloud with att bunch of researchers.

2. What's your network setup? If you have more than 20Gb/s to that node,
your journals will likely become the (write) bottleneck.
But that's only the case with backfills or again largish sequential writes
of course.

Currently it’s bonded (LACP) 2x10Gbit for both the front and backend, but soon going to
upgrade to 4x10Gbit front and 2x100Gbit back. (Already have a test cluster with this setup).

3. A repeat of sorts of the previous 2 points, this time with the focus on
endurance. How much data are you writing per day to an average OSD?
With 18 OSDs per 400GB P3700 NVMe you will want that to be less than
223GB/day/OSD.

We’re growing at around 100TB/month spread over ~130 OSDs at the moment which gives me ~25GB/OSD
(I wish it would be that uniformly distributed :))

4. As usual, failure domains. In the case of a NVMe failure you'll loose
twice the amount of OSDs.

Right, but having a lot of nodes (20+) mitigates this somewhat.

That all being said, at 36 OSDs I'd venture you'll run out of CPU steam
(with small write IOPS) before your journals become the bottleneck.

I agree, but that has not been the case so far.

Christian

Cheers,
Zoltan

[snip]

--
Christian Balzer Network/Systems Engineer
ch...@gol.com Global OnLine Japan/Rakuten Communications
http://www.gol.com/

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] multiple journals on SSD

Reply via email to