Re: [DISCUSS] Row Cache for HBase 2.x/3.x – feedback from the community

Vladimir Rodionov Wed, 14 Jan 2026 11:48:10 -0800

Hi Charles,


I’m definitely open to collaboration.


For now, development is happening in my personal repository, so you’re very
welcome to participate there with questions, feedback, or feature requests:

https://github.com/VladRodionov/hbase-row-cache


There are still a few important areas actively being worked on, including
proper handling of bulk-loaded data, region reassignments, and support for
sparse
rows with in-place mutation of cached data. Once these pieces are in place,
I expect the solution to be much more complete.


Longer term, I do think there may be an opportunity for a PR or some form
of upstream contribution, although I realize there is already similar
functionality
under development, so it’s not yet clear how overlapping efforts might
converge.


In any case, I’m happy to keep the discussion going and see where things
lead.


Best regards,

Vlad


On Tue, Jan 13, 2026 at 7:01 AM Charles Connell via dev <
[email protected]> wrote:

> I'm interesting in trying out the row cache for some of our data sets
> at HubSpot. No timeline available yet, although I'm sure it will be
> before the end of 2026. I'm excited to see what I can do for us.
>
> On Mon, Jan 12, 2026 at 10:52 PM Vladimir Rodionov
> <[email protected]> wrote:
> >
> > Forgot to mention: Row Cache can be easily made cache implementation
> > agnostic (Caffeine, EHCache) if it matters.
> >
> >
> > On Mon, Jan 12, 2026 at 6:27 PM Vladimir Rodionov <
> [email protected]>
> > wrote:
> >
> > > Andor, below, my answers to your questions:
> > >
> > > > Doesn't the benefits of row based caching strongly depend on the use
> > > case?
> > >
> > > Sure. It's a point queries, not a scan operation. The repo, I posted
> the
> > > link :
> > >
> https://github.com/VladRodionov/hbase-row-cache/wiki/HBase:-Why-Block-Cache-Alone-Is-No-Longer-Enough-in-the-Cloud
> > > where you can find numerous use cases, where row cache will be useful.
> > >
> > > > What’s the advantage if clients don’t always need the entire row
> just a
> > > subset of cells?
> > > Yes, this is a known limitation of a current version. There is an open
> > >  ticket to support "sparse" rows. here:
> > > https://github.com/VladRodionov/hbase-row-cache/issues/26
> > >
> > > > Is block cache more performant and memory efficient in this case?
> > >
> > > The only use case where block cache will be more performant is a scan
> > > operation, which involves multiple rows. These caches are
> complementary,
> > > not mutually exclusive. Row Cache has a serious advantage in point
> queries
> > > (It can do upto 100 Kops on full row reads, where each row is 3
> families
> > > with 3 columns and 10 versions).  Block cache is more suitable for
> larger
> > > operations, such as a scan of multiple rows.
> > > Row cache can be enabled/disabled per table and per table's column
> > > families.
> > >
> > > From RAM usage perspective, Row Cache (Carrot Cache) uses advanced data
> > > compression scheme (zstd with dictionary), which usually allows to
> save  an
> > > additional 40-50% RAM
> > > compared to all non-dictionary based compression algorithms. It works
> well
> > > even if the individual data item is less than 100 bytes. Hbase Block
> Cache
> > > (Bucket Cache)
> > > uses this type of compression as well (maybe I am wrong here?), but it
> > > compresses the whole block.
> > >
> > > Performance-wise, I think Row Cache should be much faster than Block
> Cache
> > > if blocks cached are compressed (you will need to decompress and
> decode the
> > > whole block on a point read).
> > >
> > > Another limitation of a Block (Bucket) cache is a high meta-data
> overhead
> > > (like 100+ bytes vs 12-16 bytes in Row Cache) All meta data in Row
> Cache
> > > (Carrot Cache) is off-heap as well.
> > >
> > > The repo has nice write up for when Row Cache is more preferable than a
> > > Block cache.
> > >
> > >
> > >
> > > On Mon, Jan 12, 2026 at 5:27 PM Andor Molnár <[email protected]> wrote:
> > >
> > >> Thanks Vladimir.
> > >>
> > >> I think this would be a great addition to HBase.
> > >>
> > >> Doesn't the benefits of row based caching strongly depend on the use
> > >> case?
> > >> What’s the advantage if clients don’t always need the entire row just
> a
> > >> subset of cells?
> > >> Is block cache more performant and memory efficient in this case?
> > >>
> > >> Regards,
> > >> Andor
> > >>
> > >>
> > >>
> > >>
> > >> > On Jan 4, 2026, at 13:02, Vladimir Rodionov <[email protected]
> >
> > >> wrote:
> > >> >
> > >> > Hello HBase community,
> > >> >
> > >> > I’d like to start a discussion around a feature that exists in
> related
> > >> > systems but is still missing in Apache HBase: row-level caching.
> > >> >
> > >> > Both *Cassandra* and *Google Bigtable* provide a row cache for hot
> rows.
> > >> > Bigtable recently revisited this area and reported measurable gains
> for
> > >> > single-row reads. HBase today relies almost entirely on *block
> cache*,
> > >> > which is excellent for scans and predictable access patterns, but
> can be
> > >> > inefficient for *small random reads*, *hot rows spanning multiple
> > >> blocks*,
> > >> > and *cloud / object-store–backed deployments*.
> > >> >
> > >> > To explore this gap, I’ve been working on an *HBase Row Cache for
> HBase
> > >> 2.x*,
> > >> > implemented as a *RegionObserver coprocessor*, and I’d appreciate
> > >> feedback
> > >> > from HBase developers and operators.
> > >> >
> > >> > *Project*:
> > >> >
> > >> > https://github.com/VladRodionov/hbase-row-cache
> > >> >
> > >> >
> > >> > *Background / motivation (cloud focus):*
> > >> >
> > >> >
> > >>
> https://github.com/VladRodionov/hbase-row-cache/wiki/HBase:-Why-Block-Cache-Alone-Is-No-Longer-Enough-in-the-Cloud
> > >> >
> > >> > What This Is
> > >> >
> > >> >
> > >> >   -
> > >> >
> > >> >   Row-level cache for HBase 2.x (coprocessor-based)
> > >> >   -
> > >> >
> > >> >   Powered by *Carrot Cache* (mostly off-heap, GC-friendly)
> > >> >   -
> > >> >
> > >> >   Multi-level cache (L1/L2/L3)
> > >> >   -
> > >> >
> > >> >   Read-through caching of table : rowkey : column-family
> > >> >   -
> > >> >
> > >> >   Cache invalidation on any mutation of the corresponding row+CF
> > >> >   -
> > >> >
> > >> >   Designed for *read-mostly, random-access* workloads
> > >> >   -
> > >> >
> > >> >   Can be enabled per table or per column family
> > >> >   -
> > >> >
> > >> >   Typically used *instead of*, not alongside, block cache
> > >> >
> > >> > *Block Cache vs Row Cache (Conceptual)*
> > >> >
> > >> > *Aspect*
> > >> >
> > >> > *Block Cache*
> > >> >
> > >> > *Row Cache*
> > >> >
> > >> > Cached unit
> > >> >
> > >> > HFile block (e.g. 64KB)
> > >> >
> > >> > Row / column family
> > >> >
> > >> > Optimized for
> > >> >
> > >> > Scans, sequential access
> > >> >
> > >> > Random small reads, hot rows
> > >> >
> > >> > Memory efficiency for small reads
> > >> >
> > >> > Low (unused data in blocks)
> > >> >
> > >> > High (cache only requested data)
> > >> >
> > >> > Rows spanning multiple blocks
> > >> >
> > >> > Multiple blocks cached
> > >> >
> > >> > Single cache entry
> > >> >
> > >> > Read-path CPU cost
> > >> >
> > >> > Decode & merge every read
> > >> >
> > >> > Amortized across hits
> > >> >
> > >> > Cloud / object store fit
> > >> >
> > >> > Necessary but expensive
> > >> >
> > >> > Reduces memory & I/O amplification
> > >> >
> > >> > Block cache remains essential; row cache targets a *different
> > >> optimization
> > >> > point*.
> > >> >
> > >> > *Non-Goals (Important)*
> > >> >
> > >> >
> > >> >   -
> > >> >
> > >> >   Not proposing removal or replacement of block cache
> > >> >   -
> > >> >
> > >> >   Not suggesting this be merged into HBase core
> > >> >   -
> > >> >
> > >> >   Not targeting scan-heavy or sequential workloads
> > >> >   -
> > >> >
> > >> >   Not eliminating row reconstruction entirely
> > >> >   -
> > >> >
> > >> >   Not optimized for write-heavy or highly mutable tables
> > >> >   -
> > >> >
> > >> >   Not changing HBase storage or replication semantics
> > >> >
> > >> > This is an *optional optimization* for a specific class of
> workloads.
> > >> >
> > >> > *Why I’m Posting*
> > >> >
> > >> > This is *not a merge proposal*, but a request for discussion:
> > >> >
> > >> >
> > >> >   1.
> > >> >
> > >> >   Do you see *row-level caching* as relevant for modern HBase
> > >> deployments?
> > >> >   2.
> > >> >
> > >> >   Are there workloads where block cache alone is insufficient today?
> > >> >   3.
> > >> >
> > >> >   Is a coprocessor-based approach reasonable for experimentation?
> > >> >   4.
> > >> >
> > >> >   Are there historical or architectural reasons why row cache never
> > >> landed
> > >> >   in HBase?
> > >> >
> > >> > Any feedback—positive or critical—is very welcome.
> > >> >
> > >> > Best regards,
> > >> >
> > >> > Vladimir Rodionov
> > >>
> > >>
>

Re: [DISCUSS] Row Cache for HBase 2.x/3.x – feedback from the community

Reply via email to