I mean an external SSD based storage, not inside the hbase cluster. We have a memcached block cache implementation in our code base, and I think it is also easy to implement a redis based block cache, then there will be bunch of open source projects which can be used, for example, like this one
https://github.com/OpenAtomFoundation/pikiwidb Thanks. Wellington Chevreuil <[email protected]> 于2026年2月25日周三 23:50写道: > > Our main reason to pick hdfs storage types/policies was to leverage > redundancy and distribution. The current architecture has limitations with > OS upgrades and node scaling, which are common use cases. > > For OS upgrades, the updated instance must be restarted, the SSD disk is > reset and the whole cache is lost, requiring reads to be served back from > S3, however, if we now use the SSD disks for HDFS SSD policy, the reads > would still be served by nearby DNs, provided nodes restarted are less than > the replication factor. > > For node scaling, regions need to be moved around different servers. That > too is problematic, currently, as it causes newly moved regions to be > uncached and read from S3. > > We originally thought about wrapping hdfs ssd storage on a bucketcache file > engine or a brand new block cache implementation, but bucket cache requires > random file access to store blocks on different segments of few large local > files, whilst implementing a new block cache type that saves each block > directly as a single file in hdfs could hammer namenode with too many small > files. > > But of course, any distributed cache solution that could be mounted over > these disks and wrapped by a block cache implementation can solve this. The > tradeoff would be yet another service to deploy and manage within the hbase > cluster itself, whilst HDFS is already present here because we need it for > WALs. > > Em qua., 25 de fev. de 2026 às 02:19, 张铎(Duo Zhang) <[email protected]> > escreveu: > > > What about implementing a SSD based external block cache storage? > > > > Wellington Chevreuil <[email protected]> 于2026年2月25日周三 > > 06:12写道: > > > > > > Dear HBase Dev Community, > > > > > > In our quest to pursue optimal performance for hbase when configuring > > root > > > dir over cloud storage, we have been relying heavily on local cache using > > > ephemeral SSD disks normally available on given cloud provider instance > > > types. > > > > > > Currently, our reference architecture for such use cases leverages file > > > based BucketCache, but that has certain limitations, and we are now at > > > early stages of testing an alternative approach using hdfs storage types > > > and policies to mirror root dir content from S3 to hdfs dirs mounted on > > > these ephemeral SSD disks. Additional technical details are documented > > here > > > < > > https://docs.google.com/document/d/1XcUSBrsVTQLq_j_Tq6FrC_NRPiVNxdIU8gu2gDPT-eM/edit?tab=t.0 > > > > > > . > > > > > > So far in our PoC, we have implemented and tested the read/write from/to > > > the different file system locations, we would like to continue this > > feature > > > implementation in collaboration with the community, so we plan to open an > > > umbrella jira ticket, together with individual subtasks for each > > > functionality described in the document mentioned above. Comments and > > > feedback are very welcome. > > > > > > Thanks, > > > Wellington > >
