Bumping up this thread as I have addressed all the TBD items since the PR was raised. Any input would be appreciated!
Thank you, Hari PS: I reformatted the original message quoted below for better readability. On 2025/11/10 14:27:11 Hari Krishna Dara wrote: > Dear HBase Developers, > > I am pleased to announce that the key management feature for encryption at > rest is now ready for community review. This is a significant enhancement > to HBase's security capabilities, and I would greatly appreciate your > feedback and insights. > > Pull Request: https://github.com/apache/hbase/pull/7421 > Branch: HBASE-29368-key-management-feature > Primary JIRA: HBASE-29368 > Design Document: > https://docs.google.com/document/d/1ToW_rveXHXUc1F6eFNQfu5LOeMAjzgq6FcYUDbdZrSM/edit?tab=t.0 > > Overview > This feature introduces a comprehensive key management system that > extends HBase's existing encryption-at-rest capabilities. The > implementation provides enterprise-grade key lifecycle management with > support for key rotation, hierarchical namespace resolution for key lookup, > key caching and improved integration with key management systems to handle > key life cycles and external key changes. > > Key Features > > 1. Managed Keys Infrastructure > > - Introduction of ManagedKeyProvider interface for pluggable key > provider implementations on the lines of the existing KeyProvider > interface. > - The new interface can also return Data Encryption Keys (DEKs) and a > lot more details on the keys. > - Comes with the default ManagedKeyStoreKeyProvider implementation using > Java KeyStore, similar to the existing KeyStoreKeyProvider. > - Enables logical key isolation for multi-tenant scenarios through > custodian identifiers (future use cases) and the special default global > custodian. > - Hierarchical namespace resolution for DEKs with automatic fallback: > explicit CF namespace attribute → constructed table/family namespace → > table name → global namespace > > 2. System Key (STK) Management > > - Cluster-wide system key for wrapping data encryption keys (DEKs). This > is equivalent to the existing master key, but better managed and > operation > friendly. > - Secure storage in HDFS with support for automatic key rotation during > boot up. > - Admin API to trigger key rotation and propagation to all RegionServers > without needing to do a rolling restart. > - Preserves the current double-wrapping architecture: DEKs wrapped by > STK, STK sourced from external KMS > > 3. KeymetaAdmin API > > - enableKeyManagement(keyCust, keyNamespace) - Enable key management for > a custodian/namespace pair > - getManagedKeys(keyCust, keyNamespace) - Query key status and metadata > - rotateSTK() - Check for and propagate new system keys > - disableKeyManagement(keyCust, keyNamespace) - Disable all the keys for > a custodian/namespace (TBD) > - disableManagedKey(keyCust, keyNamespace, keyMetadataHash) - Disable a > specific key (TBD) > - rotateManagedKey(keyCust, keyNamespace) - Rotate the active key (TBD) > - refreshManagedKeys(keyCust, keyNamespace) - Refresh from external KMS > to validate all the keys. (TBD) > - Internal cache management operations for convenience and meeting SLAs. > (TBD) > > 4. Persistent Key Metadata Storage > > - New system table hbase:keymeta for storing key metadata and state > which acts as an L2 cache. > - Tracks key lifecycle: ACTIVE, INACTIVE, DISABLED, FAILED states > - Stores wrapped DEKs and metadata for key lookup without depending on > external KMS. > - Optimized for high-priority access with in-memory column families > - Key metadata tracking with cryptographic hashes for integrity > verification > > 5. Multi-Layer Caching > > - L1: In-memory Caffeine cache on RegionServers for hot key data > - L2: Keymeta table for persistent key metadata that is shared across > all RegionServers. > - L3: Dynamic lookup from external KMS as fallback when not found in L2. > - Cache invalidation mechanism for key rotation scenarios > > 6. HBase Shell Integration > > - enable_key_management - Enable key management for a custodian and > namespace > - show_key_status - Display key status and metadata > - rotate_stk - Trigger system key rotation > - disable_key_management - Disable key management for a custodian and > namespace (TBD) > - disable_managed_key - Disable a specific key (TBD) > - rotate_managed_key - Rotate the active key (TBD) > - refresh_managed_keys - Refresh all keys for a custodian and namespace > (TBD) > > Implementation Highlights > > - Backward Compatibility: Changes are fully compatible with existing > encryption-at-rest configuration > - Gradual step-by-step migration: Well defined migration path from > existing configuration to new configuration > - Performance: Minimal overhead through efficient caching and lazy key > loading > - Security: Cryptographic verification of key metadata, secure key > wrapping > - Operability: Administrative tools for key life cycle and cache > management > - Extensibility: Plugin architecture for custom key provider > implementations > - Testing: Comprehensive unit and integration tests coverage > > ArchitectureThe implementation follows a layered architecture: > > > 1. Provider Layer: Pluggable ManagedKeyProvider for KMS integration > 2. Management Layer: KeyMetaAdmin API for administrative operations > 3. Persistence Layer: KeymetaTableAccessor for metadata storage > 4. Cache Layer: ManagedKeyDataCache and SystemKeyCache for performance > 5. Service Layer: Coprocessor endpoints for client-server communication > > Areas for ReviewI would particularly appreciate feedback on: > > > 1. API Design: Is the KeymetaAdmin API intuitive and complete for > common key management scenarios? > 2. Security Model: Does the double-wrapping architecture (DEK wrapped > by STK, STK from KMS) provide appropriate security guarantees? > 3. Performance: Are there potential bottlenecks in the caching > strategy or table access patterns? > 4. Operational Aspects: Are the administrative commands sufficient for > the needs of operations and monitoring? > 5. Testing Coverage: Are there additional test scenarios we should > cover? > 6. Documentation: Is the design document clear? What additional > documentation would be helpful? > 7. Compatibility: Any concerns about interaction with existing HBase > features? > > Next StepsAfter incorporating community feedback, I plan to: > > 1. Address any issues identified during review > 2. Implement the work identified for future phases > 3. Add additional documentation to the reference guide > > How to Review > > This PR introduces changes across multiple modules. Rather > than reviewing all 143 files, I recommend focusing on these core > components first: > > Core Architecture: > > 1. Design document (linked above) - architectural overview > 2. ManagedKeyProvider, KeymetaAdmin, ManagedKeyData interfaces > ( hbase-common) > 3. ManagedKeys.proto - protocol definitions > 4. HMaster and misc. procedure changes - initialization of keymeta in a > predictable order > 5. FixedFileTrailer + reader/writer changes - encode/decode additional > encryption key in store files > > Key Implementation: > > 1. KeymetaAdminImpl, KeymetaTableAccessor, ManagedKeyUtils, > SystemKeyManager, SystemKeyAccessor - admin operations and persistence > 2. ManagedKeyDataCache, SystemKeyCache - caching layer > 3. SecurityUtil - encryption context creation > > Client & Shell: > > 1. KeymetaAdminClient - client API > 2. Shell commands and Ruby wrappers > > Tests & Examples: > > 1. TestKeymetaAdminImpl, TestManagedKeymeta - for usage patterns > 2. key_provider_keymeta_migration_test.rb - E2E migration steps > > Note: The remaining files contain secondary changes (API updates, > test helpers, configuration constants, etc.) that can be reviewed later or > skipped for initial feedback. > > Please feel free to comment directly on the PR, or reply to this thread > with questions, concerns, or suggestions. > > Thank you for your time and expertise. Your feedback is invaluable in > ensuring this feature meets the security and operational needs of HBase. > > Best regards, > Hari Krishna Dara >
