It could act like a HashSet of KeyValues keyed on the rowKey+family+qualifier but not including the timestamp. As writes come in it would evict or overwrite previous versions (read-through vs write-through). It would only service point queries where the row+fam+qualifier are specified, returning the latest version. Wouldn't be able to do a typical rowKey-only Get (scan behind the scenes) because it wouldn't know if it contained all the cells in the row, but if you could specify all your row's qualifiers up-front it could work.
On Wed, Apr 4, 2012 at 2:30 PM, Vladimir Rodionov <[email protected]>wrote: > 1. 2KB can be too large for some applications. For example, some of our > k-v sizes < 100 bytes combined. > 2. These tables (from 1.) do not benefit from block cache at all (we did > not try 100 B block size yet :) > 3. And Matt is absolutely right: small block size is expensive > > How about doing point queries on K-V cache and bypass K-V cache on all > Scans (when someone really need this)? > Implement K-V cache as a coprocessor application? > > Invalidation of K-V entry is not necessary if all upserts operations go > through K-V cache firstly if it sits in front of MemStore. > There will be no "stale or invalid" data situation in this case. Correct? > No need for data to be sorted and no need for data to be merged > into a scan (we do not use K-V cache for Scans) > > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [email protected] > > ________________________________________ > From: Matt Corgan [[email protected]] > Sent: Wednesday, April 04, 2012 11:40 AM > To: [email protected] > Subject: Re: keyvalue cache > > I guess the benefit of the KV cache is that you are not holding entire 64K > blocks in memory when you only care about 200 bytes of them. Would an > alternative be to set a small block size (2KB or less)? > > The problems with small block sizes would be expensive block cache > management overhead and inefficient scanning IO due to lack of read-ahead. > Maybe improving the cache management and read-ahead would be more general > improvements that don't add as much complexity? > > I'm having a hard time envisioning how you would do invalidations on the KV > cache and how you would merge its entries into a scan, etc. Would it > basically be a memstore in front of the memstore where KVs get individually > invalidated instead of bulk-flushed? Would it be sorted or hashed? > > Matt > > On Wed, Apr 4, 2012 at 10:35 AM, Enis Söztutar <[email protected]> wrote: > > > As you said, caching the entire row does not make much sense, given that > > the families are by contract the access boundaries. But caching column > > families might be a good trade of for dealing with the per-item overhead. > > > > Also agreed on cache being configurable at the table or better cf level. > I > > think we can do something like enable_block_cache = true, > > enable_kv_cache=false, per column family. > > > > Enis > > > > On Tue, Apr 3, 2012 at 11:03 PM, Vladimir Rodionov > > <[email protected]>wrote: > > > > > Usually make sense for tables with random mostly access (point queries) > > > For short-long scans block cache is preferable. > > > Cassandra has it (Row cache) but as since they cache the whole row > (which > > > can be very large) in many cases > > > it has sub par performance. Make sense to make caching configurable: > > table > > > can use key-value cache and do not use block cache > > > and vice verse. > > > > > > Best regards, > > > Vladimir Rodionov > > > Principal Platform Engineer > > > Carrier IQ, www.carrieriq.com > > > e-mail: [email protected] > > > > > > ________________________________________ > > > From: Enis Söztutar [[email protected]] > > > Sent: Tuesday, April 03, 2012 3:34 PM > > > To: [email protected] > > > Subject: keyvalue cache > > > > > > Hi, > > > > > > Before opening the issue, I though I should ask around first. What do > you > > > think about a keyvalue cache sitting on top of the block cache? It is > > > mentioned in the big table paper, and it seems that zipfian kv access > > > patterns might benefit from something like this a lot. I could not find > > > anybody who proposed that before. > > > > > > What do you guys think? Should we pursue a kv query-cache. My gut > feeling > > > says that especially for some workloads we might gain significant > > > performance improvements, but we cannot verify it, until we implement > and > > > profile it, right? > > > > > > Thanks, > > > Enis > > > > > > Confidentiality Notice: The information contained in this message, > > > including any attachments hereto, may be confidential and is intended > to > > be > > > read only by the individual or entity to whom this message is > addressed. > > If > > > the reader of this message is not the intended recipient or an agent or > > > designee of the intended recipient, please note that any review, use, > > > disclosure or distribution of this message or its attachments, in any > > form, > > > is strictly prohibited. If you have received this message in error, > > please > > > immediately notify the sender and/or [email protected] and > > > delete or destroy any copy of this message and its attachments. > > > > > > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or [email protected] and > delete or destroy any copy of this message and its attachments. >
