Hi Hari,

I find it interesting, the different creative ways that people imagine
use for HBase's ordered bytes physical characteristic. Usually those
ideas are around region placement -- this is pretty clever! Outside of
your specific use-case, can you propose other reasons why HBase should
maintain such data partitioning? Are there other features that become
possible with this enabled? I'd like to work out if this should be a
default implementation of our storage system, or if this is going to
forever sit behind a disabled feature flag.

This does sound like a monstrous changeset. Do you have a strategy for
delivering this in reviewable pieces? Are there meaningful
refactorings that should be done to the HFile classes along the way?
Do you intend to backport to branch-2, or will you be content with
getting it onto branch-3?

Thanks,
Nick

On Mon, Feb 10, 2025 at 8:10 AM Hari Krishna Dara <harid...@gmail.com> wrote:
>
> Hi Everyone,
>
> HBase currently supports encryption at rest, but the entire HFile is
> encrypted with a single encryption key. We now have the need to use
> different encryption keys for different parts of HFile.
>
> Background on our requirements:
>
> * We consume HBase using Phoenix and take advantage of the multitenat
> features offered by Phoenix.
> * Roughly speaking, Phoenix maps the “Tenant ID” to the row prefix so
> all the data that belongs to the same tenant has rows starting with
> the corresponding ID.
> * We need to encrypt data belonging to each tenant with a different
> key (with an option for the tenant to manage their own keys).
> * There is no concept of a tenant in HBase but since the tenant ID
> maps mostly to the row prefix, we would like to satisfy the
> requirement by adding support for recognizing different encryption
> keys based on the row prefix and call it “Prefix Based Encryption”
> (PBE).
>
>
> Current encryption feature:
>
> * At the time of write:
>     * HBase determines the encryption key by either a user-set column
> family attribute or generates a new key for every HFile write.
>     * The key is then used to encrypt each block individually and is
> stored in the file trailed after being wrapped with a cluster wide
> master key. The master key itself is derived from the configured
> KeyProvider implementation.
> * At the time of read:
>     * The encryption key is discovered from HFile metadata and
> decrypted using the master key.
>     * The block is then decrypted and added to the block cache.
> * The entire process is abstracted out such that only the low level
> HFile read/write code worries about encryption and the rest of the
> HBase code needs no knowledge of encryption at all.
>
>
> Core of the proposed PBE Feature:
>
> * Enabling PBE:
>     * Configure a system-level boolean property and a PBEKeyProvider
> implementation to retrieve keys from an external KMS based on the
> given PBE prefix.
>     * Set a new table-level property to enable PBE and the length of
> the row prefix for key selection consideration.
> * Write Process:
>     * Ensure a block only contains data for the same PBE prefix.
>     * Encrypt each block with a key specific to the prefix.
>     * Store all encryption keys with the data.
> *  Read Process:
>     * Discover prefixes and their encryption keys from HFile metadata.
>     * Decrypt blocks using the correct key and add them to the block cache.
> * A new table level property enables PBE and sets the length of the
> row prefix (with a system wide default) for key selection
> consideration.
> * At the time of write,
>     * An additional criterion is enforced on the block boundary to
> make sure a block only contains data for the same PBE prefix.
>     * Since each block uniformly belongs to data of a specific prefix,
> it will be encrypted with a key that is specific to the prefix.
>     * All the encryption keys will be encrypted and stored with the
> data, similar to what happens now.
> * At the time of read,
>     * The prefixes for the blocks of interest and their encryption
> keys are discovered from the HFile metadata
>     * The blocks are decrypted using the correct key and added to block cache
>
>
> High-Level Scope of Work:
>
> * In-Memory Cache for Keys (L1):
>     * Reduce dependency on KMS and improve performance.
> * New Meta Table: keymeta (L2):
>     * Persist keys, their metadata, usage stats and a status for
> sharing across cluster and to reduce dependency on KMS
>     * Facilitate key management operations such as activation,
> rotation, deactivation, and disabling.
> * HFile Format Changes:
>     * Store multiple keys, one for each unique PBE prefix.
>     * Facilitate preserving data that can’t be decrypted. (see below)
> * HFile Writer/Reader Changes:
>     * Update HFileContext to handle multiple keys.
>     * Implement writer and reader changes for data path and compactions.
>     * Track the number of blocks encrypted by each key to initiate
> internal rotation.
> * Administrative Interface:
>     * Provide an RPC interface for key injection and management.
>     * Add new HBase shell commands for accessing RPC calls.
> * Default PBEKeyProvider Implementation:
>     * Implement on top of keystore access.
> * Master Key Rotation:
>     * Implement new administrative operations to rotate the master key.
> * New HFile format:
>     * Separate section for each PBE prefix.
>     * Each section is formatted as a logical HFile.
>     * New section index at physical HFile to navigate sections.
>
>
> Need for new HFile format:
>
> * When a key is disabled, existing data encrypted by the key can no
> longer be decrypted.
> * The key can be reactivated in the future, allowing such data to be
> read, which means compactions need to preserve this data.
> * Sections make it easier to carry forward this data as a whole from
> one HFile to another, with their own indexes etc.
>
>
> Thank you,
> Hari

Reply via email to