haridsv opened a new pull request, #7618: URL: https://github.com/apache/hbase/pull/7618
This PR implements the key management feature for HBase encryption at rest, building on the API surface and refactoring introduced in the precursor PR (#7584). It supersedes PR #7421 which originally had most of the changes from this PR as well PR #7584. Jira: [HBASE-29368](https://issues.apache.org/jira/browse/HBASE-29368) Design doc: https://docs.google.com/document/d/1ToW_rveXHXUc1F6eFNQfu5LOeMAjzgq6FcYUDbdZrSM/edit?usp=sharing Discussion thread: https://lists.apache.org/thread/q7g2rr2xcgl64rkn9j3mnokf6fvohp2y Cumulative changes from feature branch corresponding to the following sub-tasks: 1. [Phase 1: Key caching and minimal service](https://issues.apache.org/jira/browse/HBASE-29402) 2. [Phase 2: Integrate key management with existing encryption](https://issues.apache.org/jira/browse/HBASE-29495) 3. [Phase 2: Migration path from current encryption to managed encryption](https://issues.apache.org/jira/browse/HBASE-29617) 4. [Phase 2: Admin API to trigger for System Key rotation detection as an alternative to failover.](https://issues.apache.org/jira/browse/HBASE-29643) 5. [Phase 3: Additional key management APIs](https://issues.apache.org/jira/browse/HBASE-29666) This feature introduces a comprehensive key management system that extends HBase's existing encryption-at-rest capabilities. The implementation provides enterprise-grade key lifecycle management with support for key rotation, hierarchical namespace resolution for key lookup, key caching and improved integration with key management systems to handle key life cycles and external key changes. **1. Managed Keys Infrastructure** - Introduction of `ManagedKeyProvider` interface for pluggable key provider implementations on the lines of the existing `KeyProvider` interface. - The new interface can also return Data Encryption Keys (DEKs) and a lot more details on the keys. - Comes with the default `ManagedKeyStoreKeyProvider` implementation using Java KeyStore, similar to the existing `KeyStoreKeyProvider`. - Enables logical key isolation for multi-tenant scenarios through custodian identifiers (future use cases) and the special default global custodian. - Hierarchical namespace resolution for DEKs with automatic fallback: explicit CF namespace attribute → constructed `table/family` namespace → table name → global namespace **2. System Key (STK) Management** - Cluster-wide system key for wrapping data encryption keys (DEKs). This is equivalent to the existing master key, but better managed and operation friendly. - Secure storage in HDFS with support for automatic key rotation during boot up. - Admin API to trigger key rotation and propagation to all RegionServers without needing to do a rolling restart. - Preserves the current double-wrapping architecture: DEKs wrapped by STK, STK sourced from external KMS **3. KeymetaAdmin API** - `enableKeyManagement(keyCust, keyNamespace)` - Enable key management for a custodian/namespace pair - `getManagedKeys(keyCust, keyNamespace)` - Query key status and metadata - `rotateSTK()` - Check for and propagate new system keys - `disableKeyManagement(keyCust, keyNamespace)` - Disable all the keys for a custodian/namespace - `disableManagedKey(keyCust, keyNamespace, keyMetadataHash)` - Disable a specific key - `rotateManagedKey(keyCust, keyNamespace)` - Rotate the active key - `refreshManagedKeys(keyCust, keyNamespace)` - Refresh from external KMS to validate all the keys. - Internal cache management operations for convenience and meeting SLAs. **4. Persistent Key Metadata Storage** - New system table `hbase:keymeta` for storing key metadata and state which acts as an `L2` cache. - Tracks key lifecycle: `ACTIVE`, `INACTIVE`, `DISABLED`, `FAILED` states - Stores wrapped DEKs and metadata for key lookup without depending on external KMS. - Optimized for high-priority access with in-memory column families - Key metadata tracking with cryptographic hashes for integrity verification **5. Multi-Layer Caching** - L1: In-memory Caffeine cache on RegionServers for hot key data - L2: Keymeta table for persistent key metadata that is shared across all RegionServers. - L3: Dynamic lookup from external KMS as fallback when not found in L2. - Cache invalidation mechanism for key rotation scenarios **6. HBase Shell Integration** - `enable_key_management` - Enable key management for a custodian and namespace - `show_key_status` - Display key status and metadata - `rotate_stk` - Trigger system key rotation - `disable_key_management` - Disable key management for a custodian and namespace - `disable_managed_key` - Disable a specific key - `rotate_managed_key` - Rotate the active key - `refresh_managed_keys` - Refresh all keys for a custodian and namespace - **Backward Compatibility:** Changes are fully compatible with existing encryption-at-rest configuration - **Gradual step-by-step migration**: Well defined migration path from existing configuration to new configuration - **Performance:** Minimal overhead through efficient caching and lazy key loading - **Security:** Cryptographic verification of key metadata, secure key wrapping - **Operability:** Administrative tools for key life cycle and cache management - **Extensibility:** Plugin architecture for custom key provider implementations - **Testing:** Comprehensive unit and integration tests coverage The implementation follows a layered architecture: 1. **Provider Layer:** Pluggable `ManagedKeyProvider` for KMS integration 2. **Management Layer:** `KeyMetaAdmin` API for administrative operations 3. **Persistence Layer:** `KeymetaTableAccessor` for metadata storage 4. **Cache Layer:** `ManagedKeyDataCache` and `SystemKeyCache` for performance 5. **Service Layer:** Coprocessor endpoints for client-server communication I would particularly appreciate feedback on: 1. **API Design:** Is the `KeymetaAdmin` API intuitive and complete for common key management scenarios? 2. **Security Model:** Does the double-wrapping architecture (DEK wrapped by STK, STK from KMS) provide appropriate security guarantees? 3. **Performance:** Are there potential bottlenecks in the caching strategy or table access patterns? 4. **Operational Aspects:** Are the administrative commands sufficient for the needs of operations and monitoring? 5. **Testing Coverage:** Are there additional test scenarios we should cover? 6. **Documentation:** Is the design document clear? What additional documentation would be helpful? 7. **Compatibility:** Any concerns about interaction with existing HBase features? After incorporating community feedback, I plan to: 1. Address any issues identified during review 2. Implement the work identified for future phases 3. Add additional documentation to the reference guide This PR introduces changes across multiple modules, so I recommend focusing on these **core components** first: **Core Architecture:** 1. Design document (linked above) - architectural overview 2. `ManagedKeyProvider`, `KeymetaAdmin`, `ManagedKeyData` interfaces (hbase-common) 3. `ManagedKeys.proto` - protocol definitions 4. `HMaster` and misc. procedure changes - initialization of `keymeta` in a predictable order 5. `FixedFileTrailer` + reader/writer changes - encode/decode additional encryption key in store files **Key Implementation:** 1. `KeymetaAdminImpl`, `KeymetaTableAccessor`, `ManagedKeyUtils`, `SystemKeyManager`, `SystemKeyAccessor` - admin operations and persistence 2. `ManagedKeyDataCache`, `SystemKeyCache` - caching layer 3. `SecurityUtil` - encryption context creation **Client & Shell:** 1. `KeymetaAdminClient` - client API 2. Shell commands and Ruby wrappers **Tests & Examples:** 1. `TestKeymetaAdminImpl`, `TestManagedKeymeta` - for usage patterns 2. `key_provider_keymeta_migration_test.rb` - E2E migration steps -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
