Hello HBase community,

I’d like to start a discussion around a feature that exists in related
systems but is still missing in Apache HBase: row-level caching.

Both *Cassandra* and *Google Bigtable* provide a row cache for hot rows.
Bigtable recently revisited this area and reported measurable gains for
single-row reads. HBase today relies almost entirely on *block cache*,
which is excellent for scans and predictable access patterns, but can be
inefficient for *small random reads*, *hot rows spanning multiple blocks*,
and *cloud / object-store–backed deployments*.

To explore this gap, I’ve been working on an *HBase Row Cache for HBase 2.x*,
implemented as a *RegionObserver coprocessor*, and I’d appreciate feedback
from HBase developers and operators.

*Project*:

https://github.com/VladRodionov/hbase-row-cache


*Background / motivation (cloud focus):*

https://github.com/VladRodionov/hbase-row-cache/wiki/HBase:-Why-Block-Cache-Alone-Is-No-Longer-Enough-in-the-Cloud

What This Is


   -

   Row-level cache for HBase 2.x (coprocessor-based)
   -

   Powered by *Carrot Cache* (mostly off-heap, GC-friendly)
   -

   Multi-level cache (L1/L2/L3)
   -

   Read-through caching of table : rowkey : column-family
   -

   Cache invalidation on any mutation of the corresponding row+CF
   -

   Designed for *read-mostly, random-access* workloads
   -

   Can be enabled per table or per column family
   -

   Typically used *instead of*, not alongside, block cache

*Block Cache vs Row Cache (Conceptual)*

*Aspect*

*Block Cache*

*Row Cache*

Cached unit

HFile block (e.g. 64KB)

Row / column family

Optimized for

Scans, sequential access

Random small reads, hot rows

Memory efficiency for small reads

Low (unused data in blocks)

High (cache only requested data)

Rows spanning multiple blocks

Multiple blocks cached

Single cache entry

Read-path CPU cost

Decode & merge every read

Amortized across hits

Cloud / object store fit

Necessary but expensive

Reduces memory & I/O amplification

Block cache remains essential; row cache targets a *different optimization
point*.

*Non-Goals (Important)*


   -

   Not proposing removal or replacement of block cache
   -

   Not suggesting this be merged into HBase core
   -

   Not targeting scan-heavy or sequential workloads
   -

   Not eliminating row reconstruction entirely
   -

   Not optimized for write-heavy or highly mutable tables
   -

   Not changing HBase storage or replication semantics

This is an *optional optimization* for a specific class of workloads.

*Why I’m Posting*

This is *not a merge proposal*, but a request for discussion:


   1.

   Do you see *row-level caching* as relevant for modern HBase deployments?
   2.

   Are there workloads where block cache alone is insufficient today?
   3.

   Is a coprocessor-based approach reasonable for experimentation?
   4.

   Are there historical or architectural reasons why row cache never landed
   in HBase?

Any feedback—positive or critical—is very welcome.

Best regards,

Vladimir Rodionov

Reply via email to