[ 
https://issues.apache.org/jira/browse/ORC-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193291#comment-16193291
 ] 

Owen O'Malley commented on ORC-14:
----------------------------------

*Laugh* Ignore my previous comment.

I am getting closer, I need to write this up for the website, but the general 
direction is:

* Add support for encrypting columns where the writer adds two alternatives 
into the file.
   * Encrypted original data
   * Unencrypted masked data
* The format change is backwards compatible where old readers will get the 
unencrypted masked values.
* It will use the Hadoop KMS by default, although it may be overridden.
* Encryption will be AES (128 or 256 bit) in CTR mode, which allows seeks.
* Different columns may use different master keys. Each writer will generate a 
random file id that is used to create a unique encryption key for the column in 
that file. To read an encrypted column, the user will need to have the KMS 
decrypt the column's encryption key.
* The file and stripe statistics will be encrypted for the encrypted columns. 
However, the list of streams in the stripe footer will not be encrypted.
* Masking of data may have several forms:
  * Nullify - make all values null
  * Redact - replace strings and numbers with replacements based on character 
classes ('x' for letters, '9' for numbers, etc.)
  * SHA256 - replace strings and numbers with SHA256 of the value
  * Custom - user defined method

> Add column level encryption to ORC files
> ----------------------------------------
>
>                 Key: ORC-14
>                 URL: https://issues.apache.org/jira/browse/ORC-14
>             Project: ORC
>          Issue Type: New Feature
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>
> It would be useful to support column level encryption in ORC files. Since 
> each column and its associated index is stored separately, encrypting a 
> column separately isn't difficult. In terms of key distribution, it would 
> make sense to use an external server like the one in HADOOP-9331.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to