[
https://issues.apache.org/jira/browse/ORC-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193291#comment-16193291
]
Owen O'Malley commented on ORC-14:
----------------------------------
*Laugh* Ignore my previous comment.
I am getting closer, I need to write this up for the website, but the general
direction is:
* Add support for encrypting columns where the writer adds two alternatives
into the file.
* Encrypted original data
* Unencrypted masked data
* The format change is backwards compatible where old readers will get the
unencrypted masked values.
* It will use the Hadoop KMS by default, although it may be overridden.
* Encryption will be AES (128 or 256 bit) in CTR mode, which allows seeks.
* Different columns may use different master keys. Each writer will generate a
random file id that is used to create a unique encryption key for the column in
that file. To read an encrypted column, the user will need to have the KMS
decrypt the column's encryption key.
* The file and stripe statistics will be encrypted for the encrypted columns.
However, the list of streams in the stripe footer will not be encrypted.
* Masking of data may have several forms:
* Nullify - make all values null
* Redact - replace strings and numbers with replacements based on character
classes ('x' for letters, '9' for numbers, etc.)
* SHA256 - replace strings and numbers with SHA256 of the value
* Custom - user defined method
> Add column level encryption to ORC files
> ----------------------------------------
>
> Key: ORC-14
> URL: https://issues.apache.org/jira/browse/ORC-14
> Project: ORC
> Issue Type: New Feature
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
>
> It would be useful to support column level encryption in ORC files. Since
> each column and its associated index is stored separately, encrypting a
> column separately isn't difficult. In terms of key distribution, it would
> make sense to use an external server like the one in HADOOP-9331.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)