One more question. Seeing that cache tier holds data on it untill it reaches % ratio, i suppose i must set replication to 2 or higher on the cache pool to not lose hot data not writen to the cold storage in case of a drive failure, right?
Also, will there be any perfomance penalty if i set the osd journal on the same SSD as the OSD. I now have one SSD specially for journaling the SSD OSDs. I know that in the case of mechanical drive this is a problem! And thank you for clearing this things out for me. 2016-01-12 18:03 GMT+02:00 Nick Fisk <[email protected]>: > > -----Original Message----- > > From: Mihai Gheorghe [mailto:[email protected]] > > Sent: 12 January 2016 15:42 > > To: Nick Fisk <[email protected]>; [email protected] > > Subject: Re: [ceph-users] Ceph cache tier and rbd volumes/SSD primary, > HDD > > replica crush rule! > > > > > > 2016-01-12 17:08 GMT+02:00 Nick Fisk <[email protected]>: > > > -----Original Message----- > > > From: ceph-users [mailto:[email protected]] On Behalf > > Of > > > Mihai Gheorghe > > > Sent: 12 January 2016 14:56 > > > To: Nick Fisk <[email protected]>; [email protected] > > > Subject: Re: [ceph-users] Ceph cache tier and rbd volumes/SSD primary, > > HDD > > > replica crush rule! > > > > > > Thank you very much for the quick answer. > > > > > > I supose cache tier works the same way for object storage aswell!? > > > > Yes, exactly the same. The cache is actually at the object layer anyway > so it > > works the same. You can actually pin/unpin objects from the cache as > well if > > you are using it at the object level. > > > > https://github.com/ceph/ceph/pull/6326 > > > > > > How is a delete of a cinder volume handled. I ask you this because > after the > > > volume got flushed to the cold storage, i then deleted it from cinder. > It got > > > deleted from the cache pool aswell but on the HDD pool,when issuing > rbd - > > p > > > ls the volumes were gone but the space was still used (probably rados > > data) > > > untill i manually made a flush command on the cache pool (i didn't > wait too > > > long to see if the space would be cleared in time). It is probably a > > > missconfiguration from my end though. > > > > Ah yes, this is one of my pet hates. It's actually slightly worse than > what you > > describe. All the objects have to be promoted into the cache tier to be > > deleted and then afterwards, flushed, to remove them from the base tier > as > > well. For a large image, this can actually take quite a long time. > Hopefully this > > will be fixed at some point, I don't believe this would be too difficult > to fix. > > > > I assume this is done automatically and no need for manual flush, only > if in a > > hurry, right? > > What if the image is larger than the whole cache pool? I assume the image > > will be promoted into smaller object into the cache pool before deletion. > > I can live with the extra time to delete a volume from the cold storage. > My > > only grudge is with the extra network load from the extra step of > loading the > > image to the cache tier to be deleted (the SSD used for cache pool > resides on > > a different host) as i don't have 10Gb ports, only 1Gb, 6 of them on > every > > host in lacp mode. > > Yes this is fine, the objects will just get promoted until the cache is > full and then the deleted ones will then be flushed out and so on. The only > problem is that it causes cache pollution as it will force other objects > out the cache. Like I said it's not the end of the world, but very annoying. > > > > > > > > > In you opinion is cache tier ready for production? I have read that > bcache > > > (flashcache?) is used in favor of cache tier, but is not that simple > to setup > > and > > > there are disadvantages there aswell. > > > > See my recent posts about cache tiering, there is a fairly major bug > which > > limits performance if you're working set doesn't fit in the cache. > Assuming > > you are running the patch for this bug and you can live with the deletion > > problem above.....then yes I would say that its usable in production. I'm > > planning to enable it on the production pool in my cluster in the next > couple > > of weeks. > > > > I'm sorry, i'm a bit new to the ceph mailing list. Where can i see your > recent > > posts? I really need to check that patch out! > > > > Here is the patch, it's in master and is in the process of being back > ported to Hammer. I think for Infernalis, you will need to manually patch > and build. > > > https://github.com/zhouyuan/ceph/commit/8ffb4fba2086f5758a3b260c05d16552e995c452 > > > > > > > > Also is there a problem if i add a cache tier to an already existing > pool that > > has > > > data on it? Or should the pool be empty prior to adding the cache tier? > > > > Nope, that should be fine. > > > > > > I was asking this because i have a 5TB cinder volume with data on it > (mostly > > >3Gb in size). I added a cache tier to the pool that holds the volume > and i can > > see chaotic behavoiur from my W2012 instance, as in deleting files takes > a > > very long time and not all subdirectories work (i get an error of not > finding > > that directory with many small files) > > This could be related to the patch I mentioned. Without it, no matter what > the promote recency settings are set to, objects will be promoted at almost > every read/write. After the patch, ceph will obey the settings. This can > quickly overload the cluster with promotions/evictions as even small FS > reads will cause 4MB promotions. > > So you can set for example: > > Hit_set_count = 10 > Hit_set_period = 60 > Read_recency = 3 > Write_recency = 5 > > This will generate a new hit set every 1 minute and will keep 10 of them. > If the last 3 hit sets contain the object then it will be promoted on that > read request, if the last 5 hit sets contain the object then it will be > promoted on the write request. > > > > > > > > > > 2016-01-12 16:30 GMT+02:00 Nick Fisk <[email protected]>: > > > > -----Original Message----- > > > > From: ceph-users [mailto:[email protected]] On > > Behalf > > > Of > > > > Mihai Gheorghe > > > > Sent: 12 January 2016 14:25 > > > > To: [email protected] > > > > Subject: [ceph-users] Ceph cache tier and rbd volumes/SSD primary, > HDD > > > > replica crush rule! > > > > > > > > Hello, > > > > > > > > I have a question about how cache tier works with rbd volumes!? > > > > > > > > So i created a pool of SSD's for cache and a pool on HDD's for cold > storage > > > > that acts as backend for cinder volumes. I create a volume in cinder > from > > an > > > > image and spawn an instance. The volume is created in the cache pool > as > > > > expected and it will be flushed to the cold storage after a period of > > > inactivity > > > > or after the cache pool reaches 40% full as i understand. > > > > > > Cache won't be flushed after inactivity the cache agent only works on > % full > > > (either # of objects or bytes) > > > > > > > > > > > Now after the volume is flushed to the HDD and i make a read or write > > > > request in the guest OS, how does ceph handle it. Does it upload the > > whole > > > > rbd volume from the cold storage to the cache pool or only a chunk > of it > > > > where the request is made from the guest OS? > > > > > > The cache works on hot objects, so particular objects (normally 4MB) > of the > > > RBD will be promoted/demoted over time depending on access patterns. > > > > > > > > > > > Also, is the replication in ceph syncronious or async? If i set a > crush rule to > > > use > > > > as primary host the SSD one and for replication the HDD one, would > the > > > > writes and reads on the SSD;s be slowed down by the replication on > the > > > > mechanical drive? > > > > Would this configuration be viable? (i ask this because i don't have > the > > > > number of SSD to make a pool of size 3 on them) > > > > > > Its sync replication. If you have a very heavy read workload, you can > do > > what > > > you suggest and set the SSD OSD to be the primary copy for each PG, > > writes > > > will still be limited to the speed of the spinning disks, but reads > will be > > > serviced from the SSD's. However there is a risk in degraded scenarios > that > > > your performance could dramatically drop if more IO is diverted to > spinning > > > disks. > > > > > > > > > > > Thank you! > > > > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
