Re: [ceph-users] Ceph cache tier and rbd volumes/SSD primary, HDD replica crush rule!

Mihai Gheorghe Tue, 12 Jan 2016 09:27:32 -0800

One more question. Seeing that cache tier holds data on it untill it
reaches % ratio, i suppose i must set replication to 2 or higher on the
cache pool to not lose hot data not writen to the cold storage in case of a
drive failure, right?


Also, will there be any perfomance penalty if i set the osd journal on the
same SSD as the OSD. I now have one SSD specially for journaling the SSD
OSDs. I know that in the case of mechanical drive this is a problem!

And thank you for clearing this things out for me.

2016-01-12 18:03 GMT+02:00 Nick Fisk <[email protected]>:

> > -----Original Message-----
> > From: Mihai Gheorghe [mailto:[email protected]]
> > Sent: 12 January 2016 15:42
> > To: Nick Fisk <[email protected]>; [email protected]
> > Subject: Re: [ceph-users] Ceph cache tier and rbd volumes/SSD primary,
> HDD
> > replica crush rule!
> >
> >
> > 2016-01-12 17:08 GMT+02:00 Nick Fisk <[email protected]>:
> > > -----Original Message-----
> > > From: ceph-users [mailto:[email protected]] On Behalf
> > Of
> > > Mihai Gheorghe
> > > Sent: 12 January 2016 14:56
> > > To: Nick Fisk <[email protected]>; [email protected]
> > > Subject: Re: [ceph-users] Ceph cache tier and rbd volumes/SSD primary,
> > HDD
> > > replica crush rule!
> > >
> > > Thank you very much for the quick answer.
> > >
> > > I supose cache tier works the same way for object storage aswell!?
> >
> > Yes, exactly the same. The cache is actually at the object layer anyway
> so it
> > works the same. You can actually pin/unpin objects from the cache as
> well if
> > you are using it at the object level.
> >
> > https://github.com/ceph/ceph/pull/6326
> > >
> > > How is a delete of a cinder volume handled. I ask you this because
> after the
> > > volume got flushed to the cold storage, i then deleted it from cinder.
> It got
> > > deleted from the cache pool aswell but on the HDD pool,when issuing
> rbd -
> > p
> > > ls the volumes were gone but the space was still used (probably rados
> > data)
> > > untill i manually made a flush command on the cache pool (i didn't
> wait too
> > > long to see if the space would be cleared in time). It is probably a
> > > missconfiguration from my end though.
> >
> > Ah yes, this is one of my pet hates. It's actually slightly worse than
> what you
> > describe. All the objects have to be promoted into the cache tier to be
> > deleted and then afterwards, flushed, to remove them from the base tier
> as
> > well. For a large image, this can actually take quite a long time.
> Hopefully this
> > will be fixed at some point, I don't believe this would be too difficult
> to fix.
> >
> > I assume this is done automatically and no need for manual flush, only
> if in a
> > hurry, right?
> > What if the image is larger than the whole cache pool? I assume the image
> > will be promoted into smaller object into the cache pool before deletion.
> > I can live with the extra time to delete a volume from the cold storage.
> My
> > only grudge is with the extra network load from the extra step of
> loading the
> > image to the cache tier to be deleted (the SSD used for cache pool
> resides on
> > a different host) as i don't have 10Gb ports, only 1Gb, 6 of them on
> every
> > host in lacp mode.
>
> Yes this is fine, the objects will just get promoted until the cache is
> full and then the deleted ones will then be flushed out and so on. The only
> problem is that it causes cache pollution as it will force other objects
> out the cache. Like I said it's not the end of the world, but very annoying.
>
> >
> > >
> > > In you opinion is cache tier ready for production? I have read that
> bcache
> > > (flashcache?) is used in favor of cache tier, but is not that simple
> to setup
> > and
> > > there are disadvantages there aswell.
> >
> > See my recent posts about cache tiering, there is a fairly major bug
> which
> > limits performance if you're working set doesn't fit in the cache.
> Assuming
> > you are running the patch for this bug and you can live with the deletion
> > problem above.....then yes I would say that its usable in production. I'm
> > planning to enable it on the production pool in my cluster in the next
> couple
> > of weeks.
> >
> > I'm sorry, i'm a bit new to the ceph mailing list. Where can i see your
> recent
> > posts? I really need to check that patch out!
> >
>
> Here is the patch, it's in master and is in the process of being back
> ported to Hammer. I think for Infernalis, you will need to manually patch
> and build.
>
>
> https://github.com/zhouyuan/ceph/commit/8ffb4fba2086f5758a3b260c05d16552e995c452
>
>
> > >
> > > Also is there a problem if i add a cache tier to an already existing
> pool that
> > has
> > > data on it? Or should the pool be empty prior to adding the cache tier?
> >
> > Nope, that should be fine.
> >
> >
> > I was asking this because i have a 5TB cinder volume with data on it
> (mostly
> > >3Gb in size). I added a cache tier to the pool that holds the volume
> and i can
> > see chaotic behavoiur from my W2012 instance, as in deleting files takes
> a
> > very long time and not all subdirectories work (i get an error of not
> finding
> > that directory with many small files)
>
> This could be related to the patch I mentioned. Without it, no matter what
> the promote recency settings are set to, objects will be promoted at almost
> every read/write. After the patch, ceph will obey the settings. This can
> quickly overload the cluster with promotions/evictions as even small FS
> reads will cause 4MB promotions.
>
> So you can set for example:
>
> Hit_set_count = 10
> Hit_set_period = 60
> Read_recency = 3
> Write_recency = 5
>
> This will generate a new hit set every 1 minute and will keep 10 of them.
> If the last 3 hit sets contain the object then it will be promoted on that
> read request, if the last 5 hit sets contain the object then it will be
> promoted on the write request.
>
>
> >
> > >
> > > 2016-01-12 16:30 GMT+02:00 Nick Fisk <[email protected]>:
> > > > -----Original Message-----
> > > > From: ceph-users [mailto:[email protected]] On
> > Behalf
> > > Of
> > > > Mihai Gheorghe
> > > > Sent: 12 January 2016 14:25
> > > > To: [email protected]
> > > > Subject: [ceph-users] Ceph cache tier and rbd volumes/SSD primary,
> HDD
> > > > replica crush rule!
> > > >
> > > > Hello,
> > > >
> > > > I have a question about how cache tier works with rbd volumes!?
> > > >
> > > > So i created a pool of SSD's for cache and a pool on HDD's for cold
> storage
> > > > that acts as backend for cinder volumes. I create a volume in cinder
> from
> > an
> > > > image and spawn an instance. The volume is created in the cache pool
> as
> > > > expected and it will be flushed to the cold storage after a period of
> > > inactivity
> > > > or after the cache pool reaches 40% full as i understand.
> > >
> > > Cache won't be flushed after inactivity the cache agent only works on
> % full
> > > (either # of objects or bytes)
> > >
> > > >
> > > > Now after the volume is flushed to the HDD and i make a read or write
> > > > request in the guest OS, how does ceph handle it. Does it upload the
> > whole
> > > > rbd volume from the cold storage to the cache pool or only a chunk
> of it
> > > > where the request is made from the guest OS?
> > >
> > > The cache works on hot objects, so particular objects (normally 4MB)
> of the
> > > RBD will be promoted/demoted over time depending on access patterns.
> > >
> > > >
> > > > Also, is the replication in ceph syncronious or async? If i set a
> crush rule to
> > > use
> > > > as primary host the SSD one and for replication the HDD one, would
> the
> > > > writes and reads on the SSD;s be slowed down by the replication on
> the
> > > > mechanical drive?
> > > > Would this configuration be viable? (i ask this because i don't have
> the
> > > > number of SSD to make a pool of size 3 on them)
> > >
> > > Its sync replication. If you have a very heavy read workload, you can
> do
> > what
> > > you suggest and set the SSD OSD to be the primary copy for each PG,
> > writes
> > > will still be limited to the speed of the spinning disks, but reads
> will be
> > > serviced from the SSD's. However there is a risk in degraded scenarios
> that
> > > your performance could dramatically drop if more IO is diverted to
> spinning
> > > disks.
> > >
> > > >
> > > > Thank you!
> >
>
>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph cache tier and rbd volumes/SSD primary, HDD replica crush rule!

Reply via email to