Hi all, I’d like to start a discussion about refactoring the HBase block cache subsystem into a more modular and pluggable architecture.
*Motivation* The current block cache design (BlockCache, CombinedBlockCache, BucketCache) mixes several concerns: 1. storage implementation 2. L1/L2 topology and orchestration 3. placement logic and admission behavior This makes it difficult to: - introduce alternative cache implementations - evolve cache policies independently - experiment with different topologies In addition, at large scale the current implementation can incur noticeable metadata overhead. For example, with 64KB blocks and ~1.6TB cache, BucketCache may consume on the order of ~9GB of metadata, reducing effective cache capacity. *Proposal (high level)* Introduce a layered internal architecture: 1. BlockCacheEngine – storage abstraction (Lru, Bucket, etc.) 2. CacheTopology – L1/L2 coordination (exclusive/inclusive) 3. CachePlacementPolicy – admission, placement, promotion decisions 4. CacheAccessService – unified entry point for read/write paths One important addition is explicit admission control on the cache insertion path (put), allowing better handling of: - scan-once workloads - compaction-generated blocks - prefetch behavior - block-type-aware caching (data vs index vs bloom) The goal is to keep behavior unchanged initially, and introduce this structure incrementally. *Implementation plan* The work is organized under an umbrella JIRA: HBASE-30018 <https://issues.apache.org/jira/browse/HBASE-30018> (Pluggable Block Cache Architecture) Planned phases: 1. Introduce internal APIs (no behavior change) 2. Refactor CombinedBlockCache into explicit topology layer 3. Adapt BucketCache to new interfaces 4. Enable alternative cache engines (e.g., CarrotCache, EHCache) *Questions / feedback* I’d appreciate feedback on: - overall direction and layering - separation between topology and policy - admission control on the put path - compatibility concerns with existing implementations - any known pitfalls in HFileReaderImpl / write path integration If there is general agreement, I’ll start with a small initial patch introducing the internal interfaces with no behavior change. Thanks, Vladimir
