[ 
https://issues.apache.org/jira/browse/HADOOP-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16302145#comment-16302145
 ] 

Xiao Chen commented on HADOOP-15006:
------------------------------------

Hi [~moist],

Thanks for posting the design doc here! I had a quick review and have a few 
comments.
I'm not an expert on s3, so comments are from kms and hdfs. You may find some 
of the comments perhaps really shy on s3 knowledge... feel free to point to me 
the related jira or doc if so. :)

- It appears there will be NO CHANGES to the KMS, right? We are doing the s3a 
equivalent of hdfs crypto and hdfs clients, and all required KMS actions can be 
achieved using existing KMS APIs.
- {{hdfs crypto}}: (btw, there is no {{hadoop crypto}} currently, only {{hdfs 
crypto}}.)
I understand for hdfs r/w operations, we can happily update hdfs-site and 
core-site then happily use whatever hadoop fs CLIs. But {{CryptoAdmin}} 
currently just calls to hdfs. Do you intend to add a hadoop crypto wrapper, or 
is there intention to change hdfs's CryptoAdmin? I'd suggest the former...
- About metadata persistence, does option #5 mean to add a new column to the 
S3G DynamoDB table, to store the edeks along with the other metadata of a file? 
I feel this is the safest way because we don't have to worry about consistency 
particular to edeks.
- How is the EZ information stored? In HDFS it's part of the root zone 
INodeDir's xattr, and scanned upon NN start. We'd need a similar reliable way 
to make sure all files within an EZ will be encrypted. Architecture graph only 
shows BEZI and OEMI are stored separately.
- What's the behavior when KMS is slow and we create a file? If the 
generateEDEK exhausted cache, where does it hang? When this happens does it 
impact other operation? (This is a pain for NN due to the single namespace 
lock, may not be an issue for s3a but I'm curious)
- I saw raw bytes mentioned. For hdfs, the way to access raw bytes is through a 
special path: {{/.reserved/raw/original_path}}. How is this done in s3?

> Encrypt S3A data client-side with Hadoop libraries & Hadoop KMS
> ---------------------------------------------------------------
>
>                 Key: HADOOP-15006
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15006
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3, kms
>            Reporter: Steve Moist
>            Priority: Minor
>         Attachments: S3-CSE Proposal.pdf
>
>
> This is for the proposal to introduce Client Side Encryption to S3 in such a 
> way that it can leverage HDFS transparent encryption, use the Hadoop KMS to 
> manage keys, use the `hdfs crypto` command line tools to manage encryption 
> zones in the cloud, and enable distcp to copy from HDFS to S3 (and 
> vice-versa) with data still encrypted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to