Well, I wouldn't use bcache on filestore at all.First there are problems with 
all that you have said and second but way important you got doble writes (in FS 
data was written to journal and to storage disk at the same time), if jounal 
and data disk were the same then speed was divided by two getting really bad 
output.
In BlueStore things change quite a lot, first there are not double writes there 
is no "journal" (well there is  a something call Wal but  it's not used in the 
same way), data goes directly into the data disk and you only write a few 
metadata and make a commit into the DB. Rebalancing and scrub go through a 
RockDB not a file system making it way more simple and effective, you aren't 
supposed to have all the problems that you had with FS.
In addition, cache tiering has been deprecated on Red Hat Ceph Storage so I 
personally wouldn't use something deprecated by developers and support.

-------- Mensaje original --------De: Marek Grzybowski 
<[email protected]> Fecha: 13/10/17  12:22 AM  (GMT+01:00) Para: Jorge 
Pinilla López <[email protected]>, [email protected] Asunto: Re: 
[ceph-users] using Bcache on blueStore 
On 12.10.2017 20:28, Jorge Pinilla López wrote:
> Hey all!
> I have a ceph with multiple HDD and 1 really fast SSD with (30GB per OSD) per 
> host.
> 
> I have been thinking and all docs say that I should give all the SSD space 
> for RocksDB, so I would have a HDD data and a 30GB partition for RocksDB.
> 
> But it came to my mind that if the OSD isnt full maybe I am not using all the 
> space in the SSD, or maybe I prefer having a really small amount of hot k/v 
> and metadata and the data itself in a really fast device than just storing 
> all could metadata.
> 
> So I though that using Bcache to make SSD to be a cache and as metadata and 
> k/v are usually hot, they should be place on the cache. But this doesnt 
> guarantee me that k/v and metadata are actually always in the SSD cause under 
> heavy cache loads it can be pushed out (like really big data files).
> 
> So I came up with the idea of setting small 5-10GB partitions for the hot 
> RocksDB and the rest to use it as a cache, so I make sure that really hot 
> metadata is actually always on the SSD and the coulder one should be also on 
> the SSD (as a bcache) if its not really freezing, in that case they would be 
> pushed to the HDD. It also doesnt make anysense to have metadatada that you 
> never used using space on the SSD, I rather use that space to store hotter 
> data.
> 
> This is also make writes faster, and in blueStore we dont have the double 
> write problem so it should work fine.
> 
> What do you think about this? does it have any downsite? is there any other 
> way?

Hi Jorge
  I was inexperienced and tried bcache on old fsstore once. It was bad.
Mostly because bcache does not have any typical disk scheduling algorithm.
So when scrub or rebalnce was running latency on such storage was very high and 
unpredictable.
OSD deamon could not give any ioprio for disks read or writes, and additionaly
bcache cache was poisoned by scrub/rebalance.

Fortunately to me, it is very easy to rolling replace OSDs.
I use some SSDs partitions for journal now and what left for pure ssd storage.
This works really great .

If i will ever need cache, i will use cache tiering instead .


-- 
  Kind Regards
    Marek Grzybowski





_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to