Not sure about memcached or coprocessors based implementations, where you would lose a consistent view over your data. I think one of the lucene over hbase implementation uses a memory cache (cant remember if it was memcache) over hbase indexreaders and writers. You can do memcache deployments with 0 code change to hbase, but haven't heard of any one other than those guys, no? Has anyone tried it?
On Wed, Apr 4, 2012 at 2:53 PM, Matt Corgan <[email protected]> wrote: > in the mean time, memcached could provide all those benefits without adding > any complexity to hbase... > > > On Wed, Apr 4, 2012 at 2:46 PM, Matt Corgan <[email protected]> wrote: > > > It could act like a HashSet of KeyValues keyed on the > > rowKey+family+qualifier but not including the timestamp. As writes come > in > > it would evict or overwrite previous versions (read-through vs > > write-through). It would only service point queries where the > > row+fam+qualifier are specified, returning the latest version. Wouldn't > be > > able to do a typical rowKey-only Get (scan behind the scenes) because it > > wouldn't know if it contained all the cells in the row, but if you could > > specify all your row's qualifiers up-front it could work. > > > > > > On Wed, Apr 4, 2012 at 2:30 PM, Vladimir Rodionov < > [email protected] > > > wrote: > > > >> 1. 2KB can be too large for some applications. For example, some of our > >> k-v sizes < 100 bytes combined. > >> 2. These tables (from 1.) do not benefit from block cache at all (we did > >> not try 100 B block size yet :) > >> 3. And Matt is absolutely right: small block size is expensive > >> > >> How about doing point queries on K-V cache and bypass K-V cache on all > >> Scans (when someone really need this)? > >> Implement K-V cache as a coprocessor application? > >> > >> Invalidation of K-V entry is not necessary if all upserts operations go > >> through K-V cache firstly if it sits in front of MemStore. > >> There will be no "stale or invalid" data situation in this case. > Correct? > >> No need for data to be sorted and no need for data to be merged > >> into a scan (we do not use K-V cache for Scans) > >> > >> > >> Best regards, > >> Vladimir Rodionov > >> Principal Platform Engineer > >> Carrier IQ, www.carrieriq.com > >> e-mail: [email protected] > >> > >> ________________________________________ > >> From: Matt Corgan [[email protected]] > >> Sent: Wednesday, April 04, 2012 11:40 AM > >> To: [email protected] > >> Subject: Re: keyvalue cache > >> > >> I guess the benefit of the KV cache is that you are not holding entire > 64K > >> blocks in memory when you only care about 200 bytes of them. Would an > >> alternative be to set a small block size (2KB or less)? > >> > >> The problems with small block sizes would be expensive block cache > >> management overhead and inefficient scanning IO due to lack of > read-ahead. > >> Maybe improving the cache management and read-ahead would be more > general > >> improvements that don't add as much complexity? > >> > >> I'm having a hard time envisioning how you would do invalidations on the > >> KV > >> cache and how you would merge its entries into a scan, etc. Would it > >> basically be a memstore in front of the memstore where KVs get > >> individually > >> invalidated instead of bulk-flushed? Would it be sorted or hashed? > >> > >> Matt > >> > >> On Wed, Apr 4, 2012 at 10:35 AM, Enis Söztutar <[email protected]> wrote: > >> > >> > As you said, caching the entire row does not make much sense, given > that > >> > the families are by contract the access boundaries. But caching column > >> > families might be a good trade of for dealing with the per-item > >> overhead. > >> > > >> > Also agreed on cache being configurable at the table or better cf > >> level. I > >> > think we can do something like enable_block_cache = true, > >> > enable_kv_cache=false, per column family. > >> > > >> > Enis > >> > > >> > On Tue, Apr 3, 2012 at 11:03 PM, Vladimir Rodionov > >> > <[email protected]>wrote: > >> > > >> > > Usually make sense for tables with random mostly access (point > >> queries) > >> > > For short-long scans block cache is preferable. > >> > > Cassandra has it (Row cache) but as since they cache the whole row > >> (which > >> > > can be very large) in many cases > >> > > it has sub par performance. Make sense to make caching configurable: > >> > table > >> > > can use key-value cache and do not use block cache > >> > > and vice verse. > >> > > > >> > > Best regards, > >> > > Vladimir Rodionov > >> > > Principal Platform Engineer > >> > > Carrier IQ, www.carrieriq.com > >> > > e-mail: [email protected] > >> > > > >> > > ________________________________________ > >> > > From: Enis Söztutar [[email protected]] > >> > > Sent: Tuesday, April 03, 2012 3:34 PM > >> > > To: [email protected] > >> > > Subject: keyvalue cache > >> > > > >> > > Hi, > >> > > > >> > > Before opening the issue, I though I should ask around first. What > do > >> you > >> > > think about a keyvalue cache sitting on top of the block cache? It > is > >> > > mentioned in the big table paper, and it seems that zipfian kv > access > >> > > patterns might benefit from something like this a lot. I could not > >> find > >> > > anybody who proposed that before. > >> > > > >> > > What do you guys think? Should we pursue a kv query-cache. My gut > >> feeling > >> > > says that especially for some workloads we might gain significant > >> > > performance improvements, but we cannot verify it, until we > implement > >> and > >> > > profile it, right? > >> > > > >> > > Thanks, > >> > > Enis > >> > > > >> > > Confidentiality Notice: The information contained in this message, > >> > > including any attachments hereto, may be confidential and is > intended > >> to > >> > be > >> > > read only by the individual or entity to whom this message is > >> addressed. > >> > If > >> > > the reader of this message is not the intended recipient or an agent > >> or > >> > > designee of the intended recipient, please note that any review, > use, > >> > > disclosure or distribution of this message or its attachments, in > any > >> > form, > >> > > is strictly prohibited. If you have received this message in error, > >> > please > >> > > immediately notify the sender and/or [email protected] > >> > > delete or destroy any copy of this message and its attachments. > >> > > > >> > > >> > >> Confidentiality Notice: The information contained in this message, > >> including any attachments hereto, may be confidential and is intended > to be > >> read only by the individual or entity to whom this message is > addressed. If > >> the reader of this message is not the intended recipient or an agent or > >> designee of the intended recipient, please note that any review, use, > >> disclosure or distribution of this message or its attachments, in any > form, > >> is strictly prohibited. If you have received this message in error, > please > >> immediately notify the sender and/or [email protected] and > >> delete or destroy any copy of this message and its attachments. > >> > > > > >
