Yes, this matches up well with my intuitions. In a dynamic workload with random accesses spread across the primary key space, memcache distributed over the region servers or even the client nodes should provide a performance benefit by caching individual hot rows, providing low latency access to those rows. Hbase is much more targeted towards throughput. This might also serve to reduce the number of regions kept in RAM at one time - it's a shame to waste a whole region's worth of RAM to get at a single row if you don't need to - so this might actually reduce overall system-wide RAM pressure.
Of course, the proof is in the pudding and it is highly dependent on the workload. I'd certainly love to hear about anyone's experiences trying this out. Chad On 12/4/07 10:15 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: It is conceivable that memcache would eventually have only or mostly active objects in memory while hbase might have active pages/tablets/groups of objects. That might give memcache a bit of an edge. Another thing that happens with memcache is that memcache can hold the results of a complex join which some component views as a single object. The database doesn't normally view these as a single object and thus may not have as much locality. You might view memcache as an interesting transpose from column oriented data (hbase) to row oriented cache (memcache). That could easily result in interesting performance trade-offs. Hbase should be good for scanning, memcache might be better for single object access. On 12/4/07 9:05 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > >> 5. Memory caching: Instead of pinning a whole Hbase table in RAM, I'd >> recommend the use of memcached in front of Hbase to provide cached read >> access. > > Memcached is useful when many nodes need to access the same data. It > pools and shares memory across a cluster. In HBase, each node caches a > different portion of a table, no? So I don't see how memcached would > help there.
