Dear HBase Developers,

I am pleased to announce that the key management feature for encryption at
rest is now ready for community review. This is a significant enhancement
to HBase's security capabilities, and I would greatly appreciate your
feedback and insights.

*Pull Request:* https://github.com/apache/hbase/pull/7421
*Branch:* HBASE-29368-key-management-feature
*Primary JIRA:* HBASE-29368
*Design Document:*
https://docs.google.com/document/d/1ToW_rveXHXUc1F6eFNQfu5LOeMAjzgq6FcYUDbdZrSM/edit?tab=t.0

OverviewThis feature introduces a comprehensive key management system that
extends HBase's existing encryption-at-rest capabilities. The
implementation provides enterprise-grade key lifecycle management with
support for key rotation, hierarchical namespace resolution for key lookup,
key caching and improved integration with key management systems to handle
key life cycles and external key changes.

Key Features*1. Managed Keys Infrastructure*

   - Introduction of ManagedKeyProvider interface for pluggable key
   provider implementations on the lines of the existing KeyProvider
   interface.
   - The new interface can also return Data Encryption Keys (DEKs) and a
   lot more details on the keys.
   - Comes with the default ManagedKeyStoreKeyProvider implementation using
   Java KeyStore, similar to the existing KeyStoreKeyProvider.
   - Enables logical key isolation for multi-tenant scenarios through
   custodian identifiers (future use cases) and the special default global
   custodian.
   - Hierarchical namespace resolution for DEKs with automatic fallback:
   explicit CF namespace attribute → constructed table/family namespace →
   table name → global namespace

*2. System Key (STK) Management*

   - Cluster-wide system key for wrapping data encryption keys (DEKs). This
   is equivalent to the existing master key, but better managed and operation
   friendly.
   - Secure storage in HDFS with support for automatic key rotation during
   boot up.
   - Admin API to trigger key rotation and propagation to all RegionServers
   without needing to do a rolling restart.
   - Preserves the current double-wrapping architecture: DEKs wrapped by
   STK, STK sourced from external KMS

*3. KeymetaAdmin API*

   - enableKeyManagement(keyCust, keyNamespace) - Enable key management for
   a custodian/namespace pair
   - getManagedKeys(keyCust, keyNamespace) - Query key status and metadata
   - rotateSTK() - Check for and propagate new system keys
   - disableKeyManagement(keyCust, keyNamespace) - Disable all the keys for
   a custodian/namespace (TBD)
   - disableManagedKey(keyCust, keyNamespace, keyMetadataHash) - Disable a
   specific key (TBD)
   - rotateManagedKey(keyCust, keyNamespace) - Rotate the active key (TBD)
   - refreshManagedKeys(keyCust, keyNamespace) - Refresh from external KMS
   to validate all the keys. (TBD)
   - Internal cache management operations for convenience and meeting SLAs.
   (TBD)

*4. Persistent Key Metadata Storage*

   - New system table hbase:keymeta for storing key metadata and state
   which acts as an L2 cache.
   - Tracks key lifecycle: ACTIVE, INACTIVE, DISABLED, FAILED states
   - Stores wrapped DEKs and metadata for key lookup without depending on
   external KMS.
   - Optimized for high-priority access with in-memory column families
   - Key metadata tracking with cryptographic hashes for integrity
   verification

*5. Multi-Layer Caching*

   - L1: In-memory Caffeine cache on RegionServers for hot key data
   - L2: Keymeta table for persistent key metadata that is shared across
   all RegionServers.
   - L3: Dynamic lookup from external KMS as fallback when not found in L2.
   - Cache invalidation mechanism for key rotation scenarios

*6. HBase Shell Integration*

   - enable_key_management - Enable key management for a custodian and
   namespace
   - show_key_status - Display key status and metadata
   - rotate_stk - Trigger system key rotation
   - disable_key_management - Disable key management for a custodian and
   namespace (TBD)
   - disable_managed_key - Disable a specific key (TBD)
   - rotate_managed_key - Rotate the active key (TBD)
   - refresh_managed_keys - Refresh all keys for a custodian and namespace
   (TBD)

Implementation Highlights

   - *Backward Compatibility:* Changes are fully compatible with existing
   encryption-at-rest configuration
   - *Gradual step-by-step migration*: Well defined migration path from
   existing configuration to new configuration
   - *Performance:* Minimal overhead through efficient caching and lazy key
   loading
   - *Security:* Cryptographic verification of key metadata, secure key
   wrapping
   - *Operability:* Administrative tools for key life cycle and cache
   management
   - *Extensibility:* Plugin architecture for custom key provider
   implementations
   - *Testing:* Comprehensive unit and integration tests coverage

ArchitectureThe implementation follows a layered architecture:


   1. *Provider Layer:* Pluggable ManagedKeyProvider for KMS integration
   2. *Management Layer:* KeyMetaAdmin API for administrative operations
   3. *Persistence Layer:* KeymetaTableAccessor for metadata storage
   4. *Cache Layer:* ManagedKeyDataCache and SystemKeyCache for performance
   5. *Service Layer:* Coprocessor endpoints for client-server communication

Areas for ReviewI would particularly appreciate feedback on:


   1. *API Design:* Is the KeymetaAdmin API intuitive and complete for
   common key management scenarios?
   2. *Security Model:* Does the double-wrapping architecture (DEK wrapped
   by STK, STK from KMS) provide appropriate security guarantees?
   3. *Performance:* Are there potential bottlenecks in the caching
   strategy or table access patterns?
   4. *Operational Aspects:* Are the administrative commands sufficient for
   the needs of operations and monitoring?
   5. *Testing Coverage:* Are there additional test scenarios we should
   cover?
   6. *Documentation:* Is the design document clear? What additional
   documentation would be helpful?
   7. *Compatibility:* Any concerns about interaction with existing HBase
   features?

Next StepsAfter incorporating community feedback, I plan to:

   1. Address any issues identified during review
   2. Implement the work identified for future phases
   3. Add additional documentation to the reference guide

How to ReviewThis PR introduces changes across multiple modules. Rather
than reviewing all 143 files, I recommend focusing on these *core
components* first:

*Core Architecture:*

   1. Design document (linked above) - architectural overview
   2. ManagedKeyProvider, KeymetaAdmin, ManagedKeyData interfaces
   (hbase-common)
   3. ManagedKeys.proto - protocol definitions
   4. HMaster and misc. procedure changes - initialization of keymeta in a
   predictable order
   5. FixedFileTrailer + reader/writer changes - encode/decode additional
   encryption key in store files

*Key Implementation:*

   1. KeymetaAdminImpl, KeymetaTableAccessor, ManagedKeyUtils,
   SystemKeyManager, SystemKeyAccessor - admin operations and persistence
   2. ManagedKeyDataCache, SystemKeyCache - caching layer
   3. SecurityUtil - encryption context creation

*Client & Shell:*

   1. KeymetaAdminClient - client API
   2. Shell commands and Ruby wrappers

*Tests & Examples:*

   1. TestKeymetaAdminImpl, TestManagedKeymeta - for usage patterns
   2. key_provider_keymeta_migration_test.rb - E2E migration steps

*Note:* The remaining ~120 files contain secondary changes (API updates,
test helpers, configuration constants, etc.) that can be reviewed later or
skipped for initial feedback.

Please feel free to comment directly on the PR, or reply to this thread
with questions, concerns, or suggestions.

Thank you for your time and expertise. Your feedback is invaluable in
ensuring this feature meets the security and operational needs of HBase.

Best regards,
Hari Krishna Dara

Reply via email to