Hi, I'd like to check it out later.

I used to maintain Hadoop KMS for HDFS and Ozone. It tooks us many years to
stabilize and improve scale/performance, so a lot of lessons learned along
the way.
(I suppose HDFS at reset encryption is not considered for whatever reason)

On Mon, Nov 10, 2025 at 6:28 AM Hari Krishna Dara <[email protected]>
wrote:

> Dear HBase Developers,
>
> I am pleased to announce that the key management feature for encryption at
> rest is now ready for community review. This is a significant enhancement
> to HBase's security capabilities, and I would greatly appreciate your
> feedback and insights.
>
> *Pull Request:* https://github.com/apache/hbase/pull/7421
> *Branch:* HBASE-29368-key-management-feature
> *Primary JIRA:* HBASE-29368
> *Design Document:*
>
> https://docs.google.com/document/d/1ToW_rveXHXUc1F6eFNQfu5LOeMAjzgq6FcYUDbdZrSM/edit?tab=t.0
>
> OverviewThis feature introduces a comprehensive key management system that
> extends HBase's existing encryption-at-rest capabilities. The
> implementation provides enterprise-grade key lifecycle management with
> support for key rotation, hierarchical namespace resolution for key lookup,
> key caching and improved integration with key management systems to handle
> key life cycles and external key changes.
>
> Key Features*1. Managed Keys Infrastructure*
>
>    - Introduction of ManagedKeyProvider interface for pluggable key
>    provider implementations on the lines of the existing KeyProvider
>    interface.
>    - The new interface can also return Data Encryption Keys (DEKs) and a
>    lot more details on the keys.
>    - Comes with the default ManagedKeyStoreKeyProvider implementation using
>    Java KeyStore, similar to the existing KeyStoreKeyProvider.
>    - Enables logical key isolation for multi-tenant scenarios through
>    custodian identifiers (future use cases) and the special default global
>    custodian.
>    - Hierarchical namespace resolution for DEKs with automatic fallback:
>    explicit CF namespace attribute → constructed table/family namespace →
>    table name → global namespace
>
> *2. System Key (STK) Management*
>
>    - Cluster-wide system key for wrapping data encryption keys (DEKs). This
>    is equivalent to the existing master key, but better managed and
> operation
>    friendly.
>    - Secure storage in HDFS with support for automatic key rotation during
>    boot up.
>    - Admin API to trigger key rotation and propagation to all RegionServers
>    without needing to do a rolling restart.
>    - Preserves the current double-wrapping architecture: DEKs wrapped by
>    STK, STK sourced from external KMS
>
> *3. KeymetaAdmin API*
>
>    - enableKeyManagement(keyCust, keyNamespace) - Enable key management for
>    a custodian/namespace pair
>    - getManagedKeys(keyCust, keyNamespace) - Query key status and metadata
>    - rotateSTK() - Check for and propagate new system keys
>    - disableKeyManagement(keyCust, keyNamespace) - Disable all the keys for
>    a custodian/namespace (TBD)
>    - disableManagedKey(keyCust, keyNamespace, keyMetadataHash) - Disable a
>    specific key (TBD)
>    - rotateManagedKey(keyCust, keyNamespace) - Rotate the active key (TBD)
>    - refreshManagedKeys(keyCust, keyNamespace) - Refresh from external KMS
>    to validate all the keys. (TBD)
>    - Internal cache management operations for convenience and meeting SLAs.
>    (TBD)
>
> *4. Persistent Key Metadata Storage*
>
>    - New system table hbase:keymeta for storing key metadata and state
>    which acts as an L2 cache.
>    - Tracks key lifecycle: ACTIVE, INACTIVE, DISABLED, FAILED states
>    - Stores wrapped DEKs and metadata for key lookup without depending on
>    external KMS.
>    - Optimized for high-priority access with in-memory column families
>    - Key metadata tracking with cryptographic hashes for integrity
>    verification
>
> *5. Multi-Layer Caching*
>
>    - L1: In-memory Caffeine cache on RegionServers for hot key data
>    - L2: Keymeta table for persistent key metadata that is shared across
>    all RegionServers.
>    - L3: Dynamic lookup from external KMS as fallback when not found in L2.
>    - Cache invalidation mechanism for key rotation scenarios
>
> *6. HBase Shell Integration*
>
>    - enable_key_management - Enable key management for a custodian and
>    namespace
>    - show_key_status - Display key status and metadata
>    - rotate_stk - Trigger system key rotation
>    - disable_key_management - Disable key management for a custodian and
>    namespace (TBD)
>    - disable_managed_key - Disable a specific key (TBD)
>    - rotate_managed_key - Rotate the active key (TBD)
>    - refresh_managed_keys - Refresh all keys for a custodian and namespace
>    (TBD)
>
> Implementation Highlights
>
>    - *Backward Compatibility:* Changes are fully compatible with existing
>    encryption-at-rest configuration
>    - *Gradual step-by-step migration*: Well defined migration path from
>    existing configuration to new configuration
>    - *Performance:* Minimal overhead through efficient caching and lazy key
>    loading
>    - *Security:* Cryptographic verification of key metadata, secure key
>    wrapping
>    - *Operability:* Administrative tools for key life cycle and cache
>    management
>    - *Extensibility:* Plugin architecture for custom key provider
>    implementations
>    - *Testing:* Comprehensive unit and integration tests coverage
>
> ArchitectureThe implementation follows a layered architecture:
>
>
>    1. *Provider Layer:* Pluggable ManagedKeyProvider for KMS integration
>    2. *Management Layer:* KeyMetaAdmin API for administrative operations
>    3. *Persistence Layer:* KeymetaTableAccessor for metadata storage
>    4. *Cache Layer:* ManagedKeyDataCache and SystemKeyCache for performance
>    5. *Service Layer:* Coprocessor endpoints for client-server
> communication
>
> Areas for ReviewI would particularly appreciate feedback on:
>
>
>    1. *API Design:* Is the KeymetaAdmin API intuitive and complete for
>    common key management scenarios?
>    2. *Security Model:* Does the double-wrapping architecture (DEK wrapped
>    by STK, STK from KMS) provide appropriate security guarantees?
>    3. *Performance:* Are there potential bottlenecks in the caching
>    strategy or table access patterns?
>    4. *Operational Aspects:* Are the administrative commands sufficient for
>    the needs of operations and monitoring?
>    5. *Testing Coverage:* Are there additional test scenarios we should
>    cover?
>    6. *Documentation:* Is the design document clear? What additional
>    documentation would be helpful?
>    7. *Compatibility:* Any concerns about interaction with existing HBase
>    features?
>
> Next StepsAfter incorporating community feedback, I plan to:
>
>    1. Address any issues identified during review
>    2. Implement the work identified for future phases
>    3. Add additional documentation to the reference guide
>
> How to ReviewThis PR introduces changes across multiple modules. Rather
> than reviewing all 143 files, I recommend focusing on these *core
> components* first:
>
> *Core Architecture:*
>
>    1. Design document (linked above) - architectural overview
>    2. ManagedKeyProvider, KeymetaAdmin, ManagedKeyData interfaces
>    (hbase-common)
>    3. ManagedKeys.proto - protocol definitions
>    4. HMaster and misc. procedure changes - initialization of keymeta in a
>    predictable order
>    5. FixedFileTrailer + reader/writer changes - encode/decode additional
>    encryption key in store files
>
> *Key Implementation:*
>
>    1. KeymetaAdminImpl, KeymetaTableAccessor, ManagedKeyUtils,
>    SystemKeyManager, SystemKeyAccessor - admin operations and persistence
>    2. ManagedKeyDataCache, SystemKeyCache - caching layer
>    3. SecurityUtil - encryption context creation
>
> *Client & Shell:*
>
>    1. KeymetaAdminClient - client API
>    2. Shell commands and Ruby wrappers
>
> *Tests & Examples:*
>
>    1. TestKeymetaAdminImpl, TestManagedKeymeta - for usage patterns
>    2. key_provider_keymeta_migration_test.rb - E2E migration steps
>
> *Note:* The remaining ~120 files contain secondary changes (API updates,
> test helpers, configuration constants, etc.) that can be reviewed later or
> skipped for initial feedback.
>
> Please feel free to comment directly on the PR, or reply to this thread
> with questions, concerns, or suggestions.
>
> Thank you for your time and expertise. Your feedback is invaluable in
> ensuring this feature meets the security and operational needs of HBase.
>
> Best regards,
> Hari Krishna Dara
>

Reply via email to