[jira] [Commented] (HADOOP-15006) Encrypt S3A data client-side with Hadoop libraries & Hadoop KMS

Steve Moist (JIRA) Tue, 02 Jan 2018 11:59:27 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308621#comment-16308621
 ]


Steve Moist commented on HADOOP-15006:
--------------------------------------

{quote}
It appears there will be NO CHANGES to the KMS, right? 
{quote}
Yes there are no changes to the KMS, and I expect to be able to do all KMS 
actions through the existing API calls.

{quote}
Do you intend to add a hadoop crypto wrapper, or is there intention to change 
hdfs's CryptoAdmin? I'd suggest the former.
{quote}
I'm planning to change the CryptoAdmin to support S3, but it would call out 
into the S3aCryptoAdmin and change the CLI invocation.  While this may be a 
bigger change now, renaming the command can start paving the way for Azure CSE 
way down the road.

{quote}
About metadata persistence, does option #5 mean to add a new column to the S3G 
DynamoDB table, to store the edeks along with the other metadata of a file? I 
feel this is the safest way because we don't have to worry about consistency 
particular to edeks.
{quote}
No, it is not meant to be part of S3Guard as it has the ability to delete the 
table and refresh it.  Doing so would cause loss of EDEK's and therefore data 
loss.

{quote}
How is the EZ information stored? 
{quote}
It is stored as BEZI.  Simarily, the NN could read the BEZI or S3a could cache 
that as well.

{quote}
What's the behavior when KMS is slow and we create a file? If the generateEDEK 
exhausted cache, where does it hang? When this happens does it impact other 
operation? (This is a pain for NN due to the single namespace lock, may not be 
an issue for s3a but I'm curious)
{quote}
S3a would have to wait for the KMS to generate an EDEK.  This I'm not sure how 
it would affect other operations.

{quote}
I saw raw bytes mentioned. For hdfs, the way to access raw bytes is through a 
special path: /.reserved/raw/original_path. How is this done in s3?
{quote}
That piece I'm not too sure yet about.  It could be a virtual object key (such 
as bucket-name/.reserved/raw/original_path) that isn't actually created in S3 
and interpreted on a copy or move command to not decrypt data.  I would like to 
include it as potentially users could use DistCP to copy encrypted data from an 
HDFS EZ to a S3 EZ without having to decrypt it as they share the same EZK.  I 
think that feature would be a great thing to do.  But this subject needs to be 
fleshed out more.

> Encrypt S3A data client-side with Hadoop libraries & Hadoop KMS
> ---------------------------------------------------------------
>
>                 Key: HADOOP-15006
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15006
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3, kms
>            Reporter: Steve Moist
>            Priority: Minor
>         Attachments: S3-CSE Proposal.pdf
>
>
> This is for the proposal to introduce Client Side Encryption to S3 in such a 
> way that it can leverage HDFS transparent encryption, use the Hadoop KMS to 
> manage keys, use the `hdfs crypto` command line tools to manage encryption 
> zones in the cloud, and enable distcp to copy from HDFS to S3 (and 
> vice-versa) with data still encrypted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15006) Encrypt S3A data client-side with Hadoop libraries & Hadoop KMS

Reply via email to