Hi Everyone,

HBase currently supports encryption at rest, but the entire HFile is
encrypted with a single encryption key. We now have the need to use
different encryption keys for different parts of HFile.

Background on our requirements:

* We consume HBase using Phoenix and take advantage of the multitenat
features offered by Phoenix.
* Roughly speaking, Phoenix maps the “Tenant ID” to the row prefix so
all the data that belongs to the same tenant has rows starting with
the corresponding ID.
* We need to encrypt data belonging to each tenant with a different
key (with an option for the tenant to manage their own keys).
* There is no concept of a tenant in HBase but since the tenant ID
maps mostly to the row prefix, we would like to satisfy the
requirement by adding support for recognizing different encryption
keys based on the row prefix and call it “Prefix Based Encryption”
(PBE).


Current encryption feature:

* At the time of write:
    * HBase determines the encryption key by either a user-set column
family attribute or generates a new key for every HFile write.
    * The key is then used to encrypt each block individually and is
stored in the file trailed after being wrapped with a cluster wide
master key. The master key itself is derived from the configured
KeyProvider implementation.
* At the time of read:
    * The encryption key is discovered from HFile metadata and
decrypted using the master key.
    * The block is then decrypted and added to the block cache.
* The entire process is abstracted out such that only the low level
HFile read/write code worries about encryption and the rest of the
HBase code needs no knowledge of encryption at all.


Core of the proposed PBE Feature:

* Enabling PBE:
    * Configure a system-level boolean property and a PBEKeyProvider
implementation to retrieve keys from an external KMS based on the
given PBE prefix.
    * Set a new table-level property to enable PBE and the length of
the row prefix for key selection consideration.
* Write Process:
    * Ensure a block only contains data for the same PBE prefix.
    * Encrypt each block with a key specific to the prefix.
    * Store all encryption keys with the data.
*  Read Process:
    * Discover prefixes and their encryption keys from HFile metadata.
    * Decrypt blocks using the correct key and add them to the block cache.
* A new table level property enables PBE and sets the length of the
row prefix (with a system wide default) for key selection
consideration.
* At the time of write,
    * An additional criterion is enforced on the block boundary to
make sure a block only contains data for the same PBE prefix.
    * Since each block uniformly belongs to data of a specific prefix,
it will be encrypted with a key that is specific to the prefix.
    * All the encryption keys will be encrypted and stored with the
data, similar to what happens now.
* At the time of read,
    * The prefixes for the blocks of interest and their encryption
keys are discovered from the HFile metadata
    * The blocks are decrypted using the correct key and added to block cache


High-Level Scope of Work:

* In-Memory Cache for Keys (L1):
    * Reduce dependency on KMS and improve performance.
* New Meta Table: keymeta (L2):
    * Persist keys, their metadata, usage stats and a status for
sharing across cluster and to reduce dependency on KMS
    * Facilitate key management operations such as activation,
rotation, deactivation, and disabling.
* HFile Format Changes:
    * Store multiple keys, one for each unique PBE prefix.
    * Facilitate preserving data that can’t be decrypted. (see below)
* HFile Writer/Reader Changes:
    * Update HFileContext to handle multiple keys.
    * Implement writer and reader changes for data path and compactions.
    * Track the number of blocks encrypted by each key to initiate
internal rotation.
* Administrative Interface:
    * Provide an RPC interface for key injection and management.
    * Add new HBase shell commands for accessing RPC calls.
* Default PBEKeyProvider Implementation:
    * Implement on top of keystore access.
* Master Key Rotation:
    * Implement new administrative operations to rotate the master key.
* New HFile format:
    * Separate section for each PBE prefix.
    * Each section is formatted as a logical HFile.
    * New section index at physical HFile to navigate sections.


Need for new HFile format:

* When a key is disabled, existing data encrypted by the key can no
longer be decrypted.
* The key can be reactivated in the future, allowing such data to be
read, which means compactions need to preserve this data.
* Sections make it easier to carry forward this data as a whole from
one HFile to another, with their own indexes etc.


Thank you,
Hari

Reply via email to