I'd probably say 50GB to leave some extra space over-provisioned. 50GB should definitely prevent any DB operations from spilling over to the HDD.
On Tue, Nov 14, 2017, 5:43 PM Milanov, Radoslav Nikiforov <[email protected]> wrote: > Thank you, > > It is 4TB OSDs and they might become full someday, I’ll try 60GB db > partition – this is the max OSD capacity. > > > > - Rado > > > > *From:* David Turner [mailto:[email protected]] > *Sent:* Tuesday, November 14, 2017 5:38 PM > > > *To:* Milanov, Radoslav Nikiforov <[email protected]> > > *Cc:* Mark Nelson <[email protected]>; [email protected] > > > *Subject:* Re: [ceph-users] Bluestore performance 50% of filestore > > > > You have to configure the size of the db partition in the config file for > the cluster. If you're db partition is 1GB, then I can all but guarantee > that you're using your HDD for your blocks.db very quickly into your > testing. There have been multiple threads recently about what size the db > partition should be and it seems to be based on how many objects your OSD > is likely to have on it. The recommendation has been to err on the side of > bigger. If you're running 10TB OSDs and anticipate filling them up, then > you probably want closer to an 80GB+ db partition. That's why I asked how > full your cluster was and how large your HDDs are. > > > > Here's a link to one of the recent ML threads on this topic. > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020822.html > > On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov <[email protected]> > wrote: > > Block-db partition is the default 1GB (is there a way to modify this? > journals are 5GB in filestore case) and usage is low: > > > > [root@kumo-ceph02 ~]# ceph df > > GLOBAL: > > SIZE AVAIL RAW USED %RAW USED > > 100602G 99146G 1455G 1.45 > > POOLS: > > NAME ID USED %USED MAX AVAIL OBJECTS > > kumo-vms 1 19757M 0.02 31147G 5067 > > kumo-volumes 2 214G 0.18 31147G 55248 > > kumo-images 3 203G 0.17 31147G 66486 > > kumo-vms3 11 45824M 0.04 31147G 11643 > > kumo-volumes3 13 10837M 0 31147G 2724 > > kumo-images3 15 82450M 0.09 31147G 10320 > > > > - Rado > > > > *From:* David Turner [mailto:[email protected]] > *Sent:* Tuesday, November 14, 2017 4:40 PM > *To:* Mark Nelson <[email protected]> > *Cc:* Milanov, Radoslav Nikiforov <[email protected]>; > [email protected] > > > *Subject:* Re: [ceph-users] Bluestore performance 50% of filestore > > > > How big was your blocks.db partition for each OSD and what size are your > HDDs? Also how full is your cluster? It's possible that your blocks.db > partition wasn't large enough to hold the entire db and it had to spill > over onto the HDD which would definitely impact performance. > > > > On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson <[email protected]> wrote: > > How big were the writes in the windows test and how much concurrency was > there? > > Historically bluestore does pretty well for us with small random writes > so your write results surprise me a bit. I suspect it's the low queue > depth. Sometimes bluestore does worse with reads, especially if > readahead isn't enabled on the client. > > Mark > > On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote: > > Hi Mark, > > Yes RBD is in write back, and the only thing that changed was converting > OSDs to bluestore. It is 7200 rpm drives and triple replication. I also get > same results (bluestore 2 times slower) testing continuous writes on a 40GB > partition on a Windows VM, completely different tool. > > > > Right now I'm going back to filestore for the OSDs so additional tests > are possible if that helps. > > > > - Rado > > > > -----Original Message----- > > From: ceph-users [mailto:[email protected]] On Behalf > Of Mark Nelson > > Sent: Tuesday, November 14, 2017 4:04 PM > > To: [email protected] > > Subject: Re: [ceph-users] Bluestore performance 50% of filestore > > > > Hi Radoslav, > > > > Is RBD cache enabled and in writeback mode? Do you have client side > readahead? > > > > Both are doing better for writes than you'd expect from the native > performance of the disks assuming they are typical 7200RPM drives and you > are using 3X replication (~150IOPS * 27 / 3 = ~1350 IOPS). Given the small > file size, I'd expect that you might be getting better journal coalescing > in filestore. > > > > Sadly I imagine you can't do a comparison test at this point, but I'd be > curious how it would look if you used libaio with a high iodepth and a much > bigger partition to do random writes over. > > > > Mark > > > > On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote: > >> Hi > >> > >> We have 3 node, 27 OSDs cluster running Luminous 12.2.1 > >> > >> In filestore configuration there are 3 SSDs used for journals of 9 > >> OSDs on each hosts (1 SSD has 3 journal paritions for 3 OSDs). > >> > >> I've converted filestore to bluestore by wiping 1 host a time and > >> waiting for recovery. SSDs now contain block-db - again one SSD > >> serving > >> 3 OSDs. > >> > >> > >> > >> Cluster is used as storage for Openstack. > >> > >> Running fio on a VM in that Openstack reveals bluestore performance > >> almost twice slower than filestore. > >> > >> fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G > >> --numjobs=2 --time_based --runtime=180 --group_reporting > >> > >> fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G > >> --numjobs=2 --time_based --runtime=180 --group_reporting > >> > >> > >> > >> > >> > >> Filestore > >> > >> write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec > >> > >> write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec > >> > >> write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec > >> > >> > >> > >> read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec > >> > >> read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec > >> > >> read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec > >> > >> > >> > >> Bluestore > >> > >> write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec > >> > >> write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec > >> > >> write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec > >> > >> > >> > >> read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec > >> > >> read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec > >> > >> read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec > >> > >> > >> > >> > >> > >> - Rado > >> > >> > >> > >> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> [email protected] > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
