Hi Everyone, HBase currently supports encryption at rest, but the entire HFile is encrypted with a single encryption key. We now have the need to use different encryption keys for different parts of HFile.
Background on our requirements: * We consume HBase using Phoenix and take advantage of the multitenat features offered by Phoenix. * Roughly speaking, Phoenix maps the “Tenant ID” to the row prefix so all the data that belongs to the same tenant has rows starting with the corresponding ID. * We need to encrypt data belonging to each tenant with a different key (with an option for the tenant to manage their own keys). * There is no concept of a tenant in HBase but since the tenant ID maps mostly to the row prefix, we would like to satisfy the requirement by adding support for recognizing different encryption keys based on the row prefix and call it “Prefix Based Encryption” (PBE). Current encryption feature: * At the time of write: * HBase determines the encryption key by either a user-set column family attribute or generates a new key for every HFile write. * The key is then used to encrypt each block individually and is stored in the file trailed after being wrapped with a cluster wide master key. The master key itself is derived from the configured KeyProvider implementation. * At the time of read: * The encryption key is discovered from HFile metadata and decrypted using the master key. * The block is then decrypted and added to the block cache. * The entire process is abstracted out such that only the low level HFile read/write code worries about encryption and the rest of the HBase code needs no knowledge of encryption at all. Core of the proposed PBE Feature: * Enabling PBE: * Configure a system-level boolean property and a PBEKeyProvider implementation to retrieve keys from an external KMS based on the given PBE prefix. * Set a new table-level property to enable PBE and the length of the row prefix for key selection consideration. * Write Process: * Ensure a block only contains data for the same PBE prefix. * Encrypt each block with a key specific to the prefix. * Store all encryption keys with the data. * Read Process: * Discover prefixes and their encryption keys from HFile metadata. * Decrypt blocks using the correct key and add them to the block cache. * A new table level property enables PBE and sets the length of the row prefix (with a system wide default) for key selection consideration. * At the time of write, * An additional criterion is enforced on the block boundary to make sure a block only contains data for the same PBE prefix. * Since each block uniformly belongs to data of a specific prefix, it will be encrypted with a key that is specific to the prefix. * All the encryption keys will be encrypted and stored with the data, similar to what happens now. * At the time of read, * The prefixes for the blocks of interest and their encryption keys are discovered from the HFile metadata * The blocks are decrypted using the correct key and added to block cache High-Level Scope of Work: * In-Memory Cache for Keys (L1): * Reduce dependency on KMS and improve performance. * New Meta Table: keymeta (L2): * Persist keys, their metadata, usage stats and a status for sharing across cluster and to reduce dependency on KMS * Facilitate key management operations such as activation, rotation, deactivation, and disabling. * HFile Format Changes: * Store multiple keys, one for each unique PBE prefix. * Facilitate preserving data that can’t be decrypted. (see below) * HFile Writer/Reader Changes: * Update HFileContext to handle multiple keys. * Implement writer and reader changes for data path and compactions. * Track the number of blocks encrypted by each key to initiate internal rotation. * Administrative Interface: * Provide an RPC interface for key injection and management. * Add new HBase shell commands for accessing RPC calls. * Default PBEKeyProvider Implementation: * Implement on top of keystore access. * Master Key Rotation: * Implement new administrative operations to rotate the master key. * New HFile format: * Separate section for each PBE prefix. * Each section is formatted as a logical HFile. * New section index at physical HFile to navigate sections. Need for new HFile format: * When a key is disabled, existing data encrypted by the key can no longer be decrypted. * The key can be reactivated in the future, allowing such data to be read, which means compactions need to preserve this data. * Sections make it easier to carry forward this data as a whole from one HFile to another, with their own indexes etc. Thank you, Hari