[
https://issues.apache.org/jira/browse/HIVE-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875247#comment-16875247
]
Xinli Shang edited comment on HIVE-21848 at 6/28/19 9:39 PM:
-------------------------------------------------------------
Hi [~owen.omalley], yes, I looked at the HadoopShims.java earlier. I still
remember you had a super smart workaround to avoid two round trips to
generate/encrypt a working key from KMS. It reduced half of the traffic.
For the nested column questions above, I generally agree that makes sense.
There are only a few corner cases that we need to discuss.
For the example above "name: struct<first:string,last:string>", if we see the
table properties have the following entry, "encrypt.columns" =
"pii:name;other_category:name.first", what do we do? Should we through
exception? Or we just ignore "other_category:name.first" to let parent to
override it?
Do we allow exclusion of some leaf columns not to be encrypted, if their parent
is specified to be encrypted? I guess people will raise the feature request
later when it is roll out.
With that said, I am not objecting the proposal but just some thoughts on
corner cases.
was (Author: [email protected]):
Hi [~owen.omalley], yes, I looked at the HadoopShims.java earlier. I still
remember you had a super smart workaround to avoid two round trips to get
generate/encrypt a working key from KMS. It reduced half of the traffic.
For the nested column questions above, I generally agree that makes sense.
There are only a few corner cases that we need to discuss.
For the example above "name: struct<first:string,last:string>", if we see the
table properties have the following entry, "encrypt.columns" =
"pii:name;other_category:name.first", what do we do? Should we through
exception? Or we just ignore "other_category:name.first" to let parent to
override it?
Do we allow exclusion of some leaf columns not to be encrypted, if their parent
is specified to be encrypted? I guess people will raise the feature request
later when it is roll out.
With that said, I am not objecting the proposal but just some thoughts on
corner cases.
> Table property name definition between ORC and Parquet encrytion
> ----------------------------------------------------------------
>
> Key: HIVE-21848
> URL: https://issues.apache.org/jira/browse/HIVE-21848
> Project: Hive
> Issue Type: Task
> Components: Metastore
> Affects Versions: 3.0.0
> Reporter: Xinli Shang
> Assignee: Xinli Shang
> Priority: Major
> Fix For: 3.0.0
>
>
> The goal of this Jira is to define a superset of unified table property names
> that can be used for both Parquet and ORC column encryption. There is no code
> change needed for this Jira.
> *Background:*
> ORC-14 and Parquet-1178 introduced column encryption to ORC and Parquet. To
> configure the encryption, e.g. which column is sensitive, what master key to
> be used, algorithm, etc, table properties can be used. It is important that
> both Parquet and ORC can use unified names.
> According to the slide
> [https://www.slideshare.net/oom65/fine-grain-access-control-for-big-data-orc-column-encryption-137308692],
> ORC use table properties like orc.encrypt.pii, orc.encrypt.credit. While in
> the Parquet community, it is still discussing to provide several ways and
> using table properties is one of the options, while there is no detailed
> design of the table property names yet.
> So it is a good time to discuss within two communities to have unified table
> names as a superset.
> *Proposal:*
> There are several encryption properties that need to be specified for a
> table. Here is the list. This is the superset of Parquet and ORC. Some of
> them might not apply to both.
> # PII columns including nest columns
> # Column key metadata, master key metadata
> # Encryption algorithm, for example, Parquet support AES_GCM and AES_CTR.
> ORC might support AES_CTR.
> # Encryption footer - Parquet allow footer to be encrypted or plaintext
> # Footer key metadata
> Here is the table properties proposal.
> |*Table Property Name*|*Value*|*Notes*|
> |encrypt_algorithm|aes_ctr, aes_gcm|The algorithm to be used for encryption.|
> |encrypt_footer_plaintext|true, false|Parquet support plaintext and encrypted
> footer. By default, it is encrypted.|
> |encrypt_footer_key_metadata|base64 string of footer key metadata|It is up to
> the KMS to define what key metadata is. The metadata should have enough
> information to figure out the corresponding key by the KMS. |
> |encrypt_col_xxx|base64 string of column key metadata|‘xxx’ is the column
> name for example, ‘address.zipcode’.
>
> It is up to the KMS to define what key metadata is. The metadata should have
> enough information to figure out the corresponding key by the KMS.|
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)