Re: keyvalue cache

Matt Corgan Wed, 04 Apr 2012 14:47:10 -0700

It could act like a HashSet of KeyValues keyed on the
rowKey+family+qualifier but not including the timestamp.  As writes come in
it would evict or overwrite previous versions (read-through vs
write-through).  It would only service point queries where the
row+fam+qualifier are specified, returning the latest version.  Wouldn't be
able to do a typical rowKey-only Get (scan behind the scenes) because it
wouldn't know if it contained all the cells in the row, but if you could
specify all your row's qualifiers up-front it could work.



On Wed, Apr 4, 2012 at 2:30 PM, Vladimir Rodionov
<[email protected]>wrote:

> 1. 2KB can be too large for some applications. For example, some of our
> k-v sizes < 100 bytes combined.
> 2. These tables (from 1.) do not benefit from block cache at all (we did
> not try 100 B block size yet :)
> 3. And Matt is absolutely right: small block size is expensive
>
> How about doing point queries on K-V cache and  bypass K-V cache on all
> Scans (when someone really need this)?
> Implement K-V cache as a coprocessor application?
>
> Invalidation of K-V entry is not necessary if all upserts operations go
> through K-V cache firstly if it sits in front of MemStore.
> There will be no "stale or invalid" data situation in this case. Correct?
> No need for data to be sorted and no need for data to be merged
> into a scan (we do not use K-V cache for Scans)
>
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: [email protected]
>
> ________________________________________
> From: Matt Corgan [[email protected]]
> Sent: Wednesday, April 04, 2012 11:40 AM
> To: [email protected]
> Subject: Re: keyvalue cache
>
> I guess the benefit of the KV cache is that you are not holding entire 64K
> blocks in memory when you only care about 200 bytes of them.  Would an
> alternative be to set a small block size (2KB or less)?
>
> The problems with small block sizes would be expensive block cache
> management overhead and inefficient scanning IO due to lack of read-ahead.
>  Maybe improving the cache management and read-ahead would be more general
> improvements that don't add as much complexity?
>
> I'm having a hard time envisioning how you would do invalidations on the KV
> cache and how you would merge its entries into a scan, etc.  Would it
> basically be a memstore in front of the memstore where KVs get individually
> invalidated instead of bulk-flushed?  Would it be sorted or hashed?
>
> Matt
>
> On Wed, Apr 4, 2012 at 10:35 AM, Enis Söztutar <[email protected]> wrote:
>
> > As you said, caching the entire row does not make much sense, given that
> > the families are by contract the access boundaries. But caching column
> > families might be a good trade of for dealing with the per-item overhead.
> >
> > Also agreed on cache being configurable at the table or better cf level.
> I
> > think we can do something like enable_block_cache = true,
> > enable_kv_cache=false, per column family.
> >
> > Enis
> >
> > On Tue, Apr 3, 2012 at 11:03 PM, Vladimir Rodionov
> > <[email protected]>wrote:
> >
> > > Usually make sense for tables with random mostly access (point queries)
> > > For short-long scans block cache is preferable.
> > > Cassandra has it (Row cache) but as since they cache the whole row
> (which
> > > can be very large) in many cases
> > > it has sub par performance. Make sense to make caching configurable:
> > table
> > > can use key-value cache and do not use block cache
> > > and vice verse.
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: [email protected]
> > >
> > > ________________________________________
> > > From: Enis Söztutar [[email protected]]
> > > Sent: Tuesday, April 03, 2012 3:34 PM
> > > To: [email protected]
> > > Subject: keyvalue cache
> > >
> > > Hi,
> > >
> > > Before opening the issue, I though I should ask around first. What do
> you
> > > think about a keyvalue cache sitting on top of the block cache? It is
> > > mentioned in the big table paper, and it seems that zipfian kv access
> > > patterns might benefit from something like this a lot. I could not find
> > > anybody who proposed that before.
> > >
> > > What do you guys think? Should we pursue a kv query-cache. My gut
> feeling
> > > says that especially for some workloads we might gain significant
> > > performance improvements, but we cannot verify it, until we implement
> and
> > > profile it, right?
> > >
> > > Thanks,
> > > Enis
> > >
> > > Confidentiality Notice:  The information contained in this message,
> > > including any attachments hereto, may be confidential and is intended
> to
> > be
> > > read only by the individual or entity to whom this message is
> addressed.
> > If
> > > the reader of this message is not the intended recipient or an agent or
> > > designee of the intended recipient, please note that any review, use,
> > > disclosure or distribution of this message or its attachments, in any
> > form,
> > > is strictly prohibited.  If you have received this message in error,
> > please
> > > immediately notify the sender and/or [email protected] and
> > > delete or destroy any copy of this message and its attachments.
> > >
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or [email protected] and
> delete or destroy any copy of this message and its attachments.
>

Re: keyvalue cache

Reply via email to