Hi, I'd like to check it out later. I used to maintain Hadoop KMS for HDFS and Ozone. It tooks us many years to stabilize and improve scale/performance, so a lot of lessons learned along the way. (I suppose HDFS at reset encryption is not considered for whatever reason)
On Mon, Nov 10, 2025 at 6:28 AM Hari Krishna Dara <[email protected]> wrote: > Dear HBase Developers, > > I am pleased to announce that the key management feature for encryption at > rest is now ready for community review. This is a significant enhancement > to HBase's security capabilities, and I would greatly appreciate your > feedback and insights. > > *Pull Request:* https://github.com/apache/hbase/pull/7421 > *Branch:* HBASE-29368-key-management-feature > *Primary JIRA:* HBASE-29368 > *Design Document:* > > https://docs.google.com/document/d/1ToW_rveXHXUc1F6eFNQfu5LOeMAjzgq6FcYUDbdZrSM/edit?tab=t.0 > > OverviewThis feature introduces a comprehensive key management system that > extends HBase's existing encryption-at-rest capabilities. The > implementation provides enterprise-grade key lifecycle management with > support for key rotation, hierarchical namespace resolution for key lookup, > key caching and improved integration with key management systems to handle > key life cycles and external key changes. > > Key Features*1. Managed Keys Infrastructure* > > - Introduction of ManagedKeyProvider interface for pluggable key > provider implementations on the lines of the existing KeyProvider > interface. > - The new interface can also return Data Encryption Keys (DEKs) and a > lot more details on the keys. > - Comes with the default ManagedKeyStoreKeyProvider implementation using > Java KeyStore, similar to the existing KeyStoreKeyProvider. > - Enables logical key isolation for multi-tenant scenarios through > custodian identifiers (future use cases) and the special default global > custodian. > - Hierarchical namespace resolution for DEKs with automatic fallback: > explicit CF namespace attribute → constructed table/family namespace → > table name → global namespace > > *2. System Key (STK) Management* > > - Cluster-wide system key for wrapping data encryption keys (DEKs). This > is equivalent to the existing master key, but better managed and > operation > friendly. > - Secure storage in HDFS with support for automatic key rotation during > boot up. > - Admin API to trigger key rotation and propagation to all RegionServers > without needing to do a rolling restart. > - Preserves the current double-wrapping architecture: DEKs wrapped by > STK, STK sourced from external KMS > > *3. KeymetaAdmin API* > > - enableKeyManagement(keyCust, keyNamespace) - Enable key management for > a custodian/namespace pair > - getManagedKeys(keyCust, keyNamespace) - Query key status and metadata > - rotateSTK() - Check for and propagate new system keys > - disableKeyManagement(keyCust, keyNamespace) - Disable all the keys for > a custodian/namespace (TBD) > - disableManagedKey(keyCust, keyNamespace, keyMetadataHash) - Disable a > specific key (TBD) > - rotateManagedKey(keyCust, keyNamespace) - Rotate the active key (TBD) > - refreshManagedKeys(keyCust, keyNamespace) - Refresh from external KMS > to validate all the keys. (TBD) > - Internal cache management operations for convenience and meeting SLAs. > (TBD) > > *4. Persistent Key Metadata Storage* > > - New system table hbase:keymeta for storing key metadata and state > which acts as an L2 cache. > - Tracks key lifecycle: ACTIVE, INACTIVE, DISABLED, FAILED states > - Stores wrapped DEKs and metadata for key lookup without depending on > external KMS. > - Optimized for high-priority access with in-memory column families > - Key metadata tracking with cryptographic hashes for integrity > verification > > *5. Multi-Layer Caching* > > - L1: In-memory Caffeine cache on RegionServers for hot key data > - L2: Keymeta table for persistent key metadata that is shared across > all RegionServers. > - L3: Dynamic lookup from external KMS as fallback when not found in L2. > - Cache invalidation mechanism for key rotation scenarios > > *6. HBase Shell Integration* > > - enable_key_management - Enable key management for a custodian and > namespace > - show_key_status - Display key status and metadata > - rotate_stk - Trigger system key rotation > - disable_key_management - Disable key management for a custodian and > namespace (TBD) > - disable_managed_key - Disable a specific key (TBD) > - rotate_managed_key - Rotate the active key (TBD) > - refresh_managed_keys - Refresh all keys for a custodian and namespace > (TBD) > > Implementation Highlights > > - *Backward Compatibility:* Changes are fully compatible with existing > encryption-at-rest configuration > - *Gradual step-by-step migration*: Well defined migration path from > existing configuration to new configuration > - *Performance:* Minimal overhead through efficient caching and lazy key > loading > - *Security:* Cryptographic verification of key metadata, secure key > wrapping > - *Operability:* Administrative tools for key life cycle and cache > management > - *Extensibility:* Plugin architecture for custom key provider > implementations > - *Testing:* Comprehensive unit and integration tests coverage > > ArchitectureThe implementation follows a layered architecture: > > > 1. *Provider Layer:* Pluggable ManagedKeyProvider for KMS integration > 2. *Management Layer:* KeyMetaAdmin API for administrative operations > 3. *Persistence Layer:* KeymetaTableAccessor for metadata storage > 4. *Cache Layer:* ManagedKeyDataCache and SystemKeyCache for performance > 5. *Service Layer:* Coprocessor endpoints for client-server > communication > > Areas for ReviewI would particularly appreciate feedback on: > > > 1. *API Design:* Is the KeymetaAdmin API intuitive and complete for > common key management scenarios? > 2. *Security Model:* Does the double-wrapping architecture (DEK wrapped > by STK, STK from KMS) provide appropriate security guarantees? > 3. *Performance:* Are there potential bottlenecks in the caching > strategy or table access patterns? > 4. *Operational Aspects:* Are the administrative commands sufficient for > the needs of operations and monitoring? > 5. *Testing Coverage:* Are there additional test scenarios we should > cover? > 6. *Documentation:* Is the design document clear? What additional > documentation would be helpful? > 7. *Compatibility:* Any concerns about interaction with existing HBase > features? > > Next StepsAfter incorporating community feedback, I plan to: > > 1. Address any issues identified during review > 2. Implement the work identified for future phases > 3. Add additional documentation to the reference guide > > How to ReviewThis PR introduces changes across multiple modules. Rather > than reviewing all 143 files, I recommend focusing on these *core > components* first: > > *Core Architecture:* > > 1. Design document (linked above) - architectural overview > 2. ManagedKeyProvider, KeymetaAdmin, ManagedKeyData interfaces > (hbase-common) > 3. ManagedKeys.proto - protocol definitions > 4. HMaster and misc. procedure changes - initialization of keymeta in a > predictable order > 5. FixedFileTrailer + reader/writer changes - encode/decode additional > encryption key in store files > > *Key Implementation:* > > 1. KeymetaAdminImpl, KeymetaTableAccessor, ManagedKeyUtils, > SystemKeyManager, SystemKeyAccessor - admin operations and persistence > 2. ManagedKeyDataCache, SystemKeyCache - caching layer > 3. SecurityUtil - encryption context creation > > *Client & Shell:* > > 1. KeymetaAdminClient - client API > 2. Shell commands and Ruby wrappers > > *Tests & Examples:* > > 1. TestKeymetaAdminImpl, TestManagedKeymeta - for usage patterns > 2. key_provider_keymeta_migration_test.rb - E2E migration steps > > *Note:* The remaining ~120 files contain secondary changes (API updates, > test helpers, configuration constants, etc.) that can be reviewed later or > skipped for initial feedback. > > Please feel free to comment directly on the PR, or reply to this thread > with questions, concerns, or suggestions. > > Thank you for your time and expertise. Your feedback is invaluable in > ensuring this feature meets the security and operational needs of HBase. > > Best regards, > Hari Krishna Dara >
