Re: [ceph-users] Optimal OSD count for SSDs / NVMe disks

Wade Holler Thu, 04 Feb 2016 05:40:10 -0800

First on your comment of:

"we found that during times where the cache pool flushed to
the storage pool client IO took a severe hit"


We found the same thing.
http://blog.wadeit.io/ceph-cache-tier-performance-random-writes/
-- I don't claim this is a great write up, and not what a lot of folks are
interested in but it is what I was after.

Great on your fio test.  However take a look at the response time.
Naturally it will increase after 4-5 concurrent writes.  Which is of course
what you were saying and is correct.  However, I think we can generally
accept a slightly higher response time and therefore iodepth>1 is a more
real world test.  Just my thoughts. You did the right thing, and tested
well.

Some might not like it , but I like Sebastien's journal size calculation
and it has served me well:
http://slides.com/sebastienhan/ceph-performance-and-benchmarking#/24

Cheers
Wade





On Thu, Feb 4, 2016 at 7:24 AM Sascha Vogt <[email protected]> wrote:

> Hi,
>
> Am 04.02.2016 um 12:59 schrieb Wade Holler:
> > You referenced parallel writes for journal and data. Which is default
> > for btrfs but but XFS. Now you are mentioning multiple parallel writes
> > to the drive , which of course yes will occur.
> Ah, that is good to know. So if I want to create more "parallelism" I
> should use btrfs then. Thanks a lot, that's a very critical bit of
> information :)
>
> > Also Our Dell 400 Gb NVMe drives do not top out around 5-7 sequential
> > writes as you mentioned. That would be 5-7 random writes from a drives
> > perspective and the NVMe drives can do many times that.
> Hm, I used the following fio bench from [1]:
> fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k
> --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting
> --name=journal-test
>
> Our disks showed the following bandwidths: (#<no> is the numjobs
> paramenter):
>
> #1: write: io=1992.2MB, bw=33997KB/s, iops=8499
> #2: write: io=5621.6MB, bw=95940KB/s, iops=23984
> #3: write: io=8062.8MB, bw=137602KB/s, iops=34400
> #4: write: io=9114.1MB, bw=155545KB/s, iops=38886
> #5: write: io=8860.7MB, bw=151169KB/s, iops=37792
>
> Also for more jobs (tried up to 8) bandwidth stayed at around 150MB/s
> and around 37k iops. So I figured that around 5 should be the sweet spot
> in terms for journals on a single disk.
>
> > I would park it at 5-6 partitions per NVMe , journal on the same disk.
> > Frequently I want more concurrent operations , rather than all out
> > throughput.
> For journal on the same partition, should I limit the size of the
> journal size? If yes, what should be the limit? Rather large or rather
> small?
>
> Greetings
> -Sascha-
>
> [1]
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Optimal OSD count for SSDs / NVMe disks

Reply via email to