I mean an external SSD based storage, not inside the hbase cluster.
We have a memcached block cache implementation in our code base, and I
think it is also easy to implement a redis based block cache, then
there will be bunch of open source projects which can be used, for
example, like this one

https://github.com/OpenAtomFoundation/pikiwidb

Thanks.

Wellington Chevreuil <[email protected]> 于2026年2月25日周三 23:50写道:
>
> Our main reason to pick hdfs storage types/policies was to leverage
> redundancy and distribution. The current architecture has limitations with
> OS upgrades and node scaling, which are common use cases.
>
> For OS upgrades, the updated instance must be restarted, the SSD disk is
> reset and the whole cache is lost, requiring reads to be served back from
> S3, however, if we now use the SSD disks for HDFS SSD policy, the reads
> would still be served by nearby DNs, provided nodes restarted are less than
> the replication factor.
>
> For node scaling, regions need to be moved around different servers. That
> too is problematic, currently, as it causes newly moved regions to be
> uncached and read from S3.
>
> We originally thought about wrapping hdfs ssd storage on a bucketcache file
> engine or a brand new block cache implementation, but bucket cache requires
> random file access to store blocks on different segments of few large local
> files, whilst implementing a new block cache type that saves each block
> directly as a single file in hdfs could hammer namenode with too many small
> files.
>
> But of course, any distributed cache solution that could be mounted over
> these disks and wrapped by a block cache implementation can solve this. The
> tradeoff would be yet another service to deploy and manage within the hbase
> cluster itself, whilst HDFS is already present here because we need it for
> WALs.
>
> Em qua., 25 de fev. de 2026 às 02:19, 张铎(Duo Zhang) <[email protected]>
> escreveu:
>
> > What about implementing a SSD based external block cache storage?
> >
> > Wellington Chevreuil <[email protected]> 于2026年2月25日周三
> > 06:12写道:
> > >
> > > Dear HBase Dev Community,
> > >
> > > In our quest to pursue optimal performance for hbase when configuring
> > root
> > > dir over cloud storage, we have been relying heavily on local cache using
> > > ephemeral SSD disks normally available on given cloud provider instance
> > > types.
> > >
> > > Currently, our reference architecture for such use cases leverages file
> > > based BucketCache, but that has certain limitations, and we are now at
> > > early stages of testing an alternative approach using hdfs storage types
> > > and policies to mirror root dir content from S3 to hdfs dirs mounted on
> > > these ephemeral SSD disks. Additional technical details are documented
> > here
> > > <
> > https://docs.google.com/document/d/1XcUSBrsVTQLq_j_Tq6FrC_NRPiVNxdIU8gu2gDPT-eM/edit?tab=t.0
> > >
> > > .
> > >
> > > So far in our PoC, we have implemented and tested the read/write from/to
> > > the different file system locations, we would like to continue this
> > feature
> > > implementation in collaboration with the community, so we plan to open an
> > > umbrella jira ticket, together with individual subtasks for each
> > > functionality described in the document mentioned above. Comments and
> > > feedback are very welcome.
> > >
> > > Thanks,
> > > Wellington
> >

Reply via email to