[
https://issues.apache.org/jira/browse/ORC-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821188#comment-16821188
]
Xinli Shang commented on ORC-14:
--------------------------------
Yes, I looked at it earlier and also looked at the one you published last year.
They are great slides!
I agree with you about the configuration. What we do in Parquet is something
similar like table properties in HMS or other places to specify the column
crypto settings(sensitivity, encryption algorithm, etc). All of these
inforamtion will be sent through the technical stack(applicaiton layer, query
engines etc) to Parquet(it can be ORC too) inside the schema. The Parquet
plugin specified in Parquet-1396 consumes the crypto settings part of the
schema and provisions the encrytion properties that are needed by Parquet
encrytion (Parquet-1178).
The benefit of this solution is that it avoids changing much of query engines.
We just treat them like a tunnel to let the crypto settings inside schema
through to Parquet(ORC). Hence it eases and accelerates adoption.
I talked about Parquet-1396 in this year's Apache Hadoop Contributor Meetup.
You can find the link here [https://www.youtube.com/watch?v=W38CrTUJ3YM&t=140s]
> Add column level encryption to ORC files
> ----------------------------------------
>
> Key: ORC-14
> URL: https://issues.apache.org/jira/browse/ORC-14
> Project: ORC
> Issue Type: New Feature
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Priority: Major
> Attachments: columnEncryption.png
>
>
> It would be useful to support column level encryption in ORC files. Since
> each column and its associated index is stored separately, encrypting a
> column separately isn't difficult. In terms of key distribution, it would
> make sense to use an external server like the one in HADOOP-9331.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)