Dear HBase Developers, I am pleased to announce that the key management feature for encryption at rest is now ready for community review. This is a significant enhancement to HBase's security capabilities, and I would greatly appreciate your feedback and insights.
*Pull Request:* https://github.com/apache/hbase/pull/7421 *Branch:* HBASE-29368-key-management-feature *Primary JIRA:* HBASE-29368 *Design Document:* https://docs.google.com/document/d/1ToW_rveXHXUc1F6eFNQfu5LOeMAjzgq6FcYUDbdZrSM/edit?tab=t.0 OverviewThis feature introduces a comprehensive key management system that extends HBase's existing encryption-at-rest capabilities. The implementation provides enterprise-grade key lifecycle management with support for key rotation, hierarchical namespace resolution for key lookup, key caching and improved integration with key management systems to handle key life cycles and external key changes. Key Features*1. Managed Keys Infrastructure* - Introduction of ManagedKeyProvider interface for pluggable key provider implementations on the lines of the existing KeyProvider interface. - The new interface can also return Data Encryption Keys (DEKs) and a lot more details on the keys. - Comes with the default ManagedKeyStoreKeyProvider implementation using Java KeyStore, similar to the existing KeyStoreKeyProvider. - Enables logical key isolation for multi-tenant scenarios through custodian identifiers (future use cases) and the special default global custodian. - Hierarchical namespace resolution for DEKs with automatic fallback: explicit CF namespace attribute → constructed table/family namespace → table name → global namespace *2. System Key (STK) Management* - Cluster-wide system key for wrapping data encryption keys (DEKs). This is equivalent to the existing master key, but better managed and operation friendly. - Secure storage in HDFS with support for automatic key rotation during boot up. - Admin API to trigger key rotation and propagation to all RegionServers without needing to do a rolling restart. - Preserves the current double-wrapping architecture: DEKs wrapped by STK, STK sourced from external KMS *3. KeymetaAdmin API* - enableKeyManagement(keyCust, keyNamespace) - Enable key management for a custodian/namespace pair - getManagedKeys(keyCust, keyNamespace) - Query key status and metadata - rotateSTK() - Check for and propagate new system keys - disableKeyManagement(keyCust, keyNamespace) - Disable all the keys for a custodian/namespace (TBD) - disableManagedKey(keyCust, keyNamespace, keyMetadataHash) - Disable a specific key (TBD) - rotateManagedKey(keyCust, keyNamespace) - Rotate the active key (TBD) - refreshManagedKeys(keyCust, keyNamespace) - Refresh from external KMS to validate all the keys. (TBD) - Internal cache management operations for convenience and meeting SLAs. (TBD) *4. Persistent Key Metadata Storage* - New system table hbase:keymeta for storing key metadata and state which acts as an L2 cache. - Tracks key lifecycle: ACTIVE, INACTIVE, DISABLED, FAILED states - Stores wrapped DEKs and metadata for key lookup without depending on external KMS. - Optimized for high-priority access with in-memory column families - Key metadata tracking with cryptographic hashes for integrity verification *5. Multi-Layer Caching* - L1: In-memory Caffeine cache on RegionServers for hot key data - L2: Keymeta table for persistent key metadata that is shared across all RegionServers. - L3: Dynamic lookup from external KMS as fallback when not found in L2. - Cache invalidation mechanism for key rotation scenarios *6. HBase Shell Integration* - enable_key_management - Enable key management for a custodian and namespace - show_key_status - Display key status and metadata - rotate_stk - Trigger system key rotation - disable_key_management - Disable key management for a custodian and namespace (TBD) - disable_managed_key - Disable a specific key (TBD) - rotate_managed_key - Rotate the active key (TBD) - refresh_managed_keys - Refresh all keys for a custodian and namespace (TBD) Implementation Highlights - *Backward Compatibility:* Changes are fully compatible with existing encryption-at-rest configuration - *Gradual step-by-step migration*: Well defined migration path from existing configuration to new configuration - *Performance:* Minimal overhead through efficient caching and lazy key loading - *Security:* Cryptographic verification of key metadata, secure key wrapping - *Operability:* Administrative tools for key life cycle and cache management - *Extensibility:* Plugin architecture for custom key provider implementations - *Testing:* Comprehensive unit and integration tests coverage ArchitectureThe implementation follows a layered architecture: 1. *Provider Layer:* Pluggable ManagedKeyProvider for KMS integration 2. *Management Layer:* KeyMetaAdmin API for administrative operations 3. *Persistence Layer:* KeymetaTableAccessor for metadata storage 4. *Cache Layer:* ManagedKeyDataCache and SystemKeyCache for performance 5. *Service Layer:* Coprocessor endpoints for client-server communication Areas for ReviewI would particularly appreciate feedback on: 1. *API Design:* Is the KeymetaAdmin API intuitive and complete for common key management scenarios? 2. *Security Model:* Does the double-wrapping architecture (DEK wrapped by STK, STK from KMS) provide appropriate security guarantees? 3. *Performance:* Are there potential bottlenecks in the caching strategy or table access patterns? 4. *Operational Aspects:* Are the administrative commands sufficient for the needs of operations and monitoring? 5. *Testing Coverage:* Are there additional test scenarios we should cover? 6. *Documentation:* Is the design document clear? What additional documentation would be helpful? 7. *Compatibility:* Any concerns about interaction with existing HBase features? Next StepsAfter incorporating community feedback, I plan to: 1. Address any issues identified during review 2. Implement the work identified for future phases 3. Add additional documentation to the reference guide How to ReviewThis PR introduces changes across multiple modules. Rather than reviewing all 143 files, I recommend focusing on these *core components* first: *Core Architecture:* 1. Design document (linked above) - architectural overview 2. ManagedKeyProvider, KeymetaAdmin, ManagedKeyData interfaces (hbase-common) 3. ManagedKeys.proto - protocol definitions 4. HMaster and misc. procedure changes - initialization of keymeta in a predictable order 5. FixedFileTrailer + reader/writer changes - encode/decode additional encryption key in store files *Key Implementation:* 1. KeymetaAdminImpl, KeymetaTableAccessor, ManagedKeyUtils, SystemKeyManager, SystemKeyAccessor - admin operations and persistence 2. ManagedKeyDataCache, SystemKeyCache - caching layer 3. SecurityUtil - encryption context creation *Client & Shell:* 1. KeymetaAdminClient - client API 2. Shell commands and Ruby wrappers *Tests & Examples:* 1. TestKeymetaAdminImpl, TestManagedKeymeta - for usage patterns 2. key_provider_keymeta_migration_test.rb - E2E migration steps *Note:* The remaining ~120 files contain secondary changes (API updates, test helpers, configuration constants, etc.) that can be reviewed later or skipped for initial feedback. Please feel free to comment directly on the PR, or reply to this thread with questions, concerns, or suggestions. Thank you for your time and expertise. Your feedback is invaluable in ensuring this feature meets the security and operational needs of HBase. Best regards, Hari Krishna Dara
