On Mon, 7 Jul 2014, Luis Pab?n wrote:
> What about the following usecase (please forgive some of my ceph architecture
> ignorance):
>
> If it was possible to setup OSD caching tier at the host (if the host had a
> dedicated SSD for accelerating I/O), then caching pools could be created to
> cache VM rbds, since they are inherently exclusive to a single host. Using a
> write through (or a readonly, depending on the workload) policy would have a
> major increase in VM IOPs. Using writethrough or readonly policy would also
> ensure any writes are first written to the back end storage tier. Enabling
> hosts to service most of their VM I/O reads would also increases the overall
> IOPs of the back end storage tier.
This could be accomplished by doing a rados pool per client host. The
rados caching only works in as a writeback cache, though, not
write-through, so you really need to replicate it for it to be usable in
practice. So although it's possible, this isn't a particularly attractive
approach.
What you're describing is really a client-side write-through cache, either
for librbd or librados. We've discussed this in the past (mostly in the
context of a shared host-wide read-only data, not as write-through), but
in both cases the caching would plug into the client libraries. There are
some CDS notes from emperor:
http://wiki.ceph.com/Planning/Sideboard/rbd%3A_shared_read_cache
http://pad.ceph.com/p/rbd-shared-read-cache
http://www.youtube.com/watch?v=SVgBdUv_Lv4&t=70m11s
Note that you can also accomplish this with the kernel rbd driver by
layering dm-cache or bcache or something similar on top and running it in
write-through mode. Most clients are (KVM+)librbd, though, so eventually
a userspace implementation for librbd (or maybe librados) makes sense.
sage
> Does this make sense?
>
> - Luis
>
> On 07/07/2014 03:29 PM, Sage Weil wrote:
> > On Mon, 7 Jul 2014, Luis Pabon wrote:
> > > Hi all,
> > > I am working on OSDMonitor.cc:5325 and wanted to confirm the
> > > following
> > > read_forward cache tier transition:
> > >
> > > readforward -> forward || writeback || (any && num_objects_dirty ==
> > > 0)
> > > forward -> writeback || readforward || (any && num_objects_dirty ==
> > > 0)
> > > writeback -> readforward || forward
> > >
> > > Is this the correct cache tier state transition?
> > That looks right to me.
> >
> > By the way, I had a thought after we spoke that we probably want something
> > that is somewhere inbetween the current writeback behavior (promote on
> > first read) and the read_forward behavior (never promote on read). I
> > suspect a good all-around policy is something like promote on second read?
> > This should probably be rolled into the writeback mode as a tunable...
> >
> > sage
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html