[
https://issues.apache.org/jira/browse/HADOOP-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308621#comment-16308621
]
Steve Moist commented on HADOOP-15006:
--------------------------------------
{quote}
It appears there will be NO CHANGES to the KMS, right?
{quote}
Yes there are no changes to the KMS, and I expect to be able to do all KMS
actions through the existing API calls.
{quote}
Do you intend to add a hadoop crypto wrapper, or is there intention to change
hdfs's CryptoAdmin? I'd suggest the former.
{quote}
I'm planning to change the CryptoAdmin to support S3, but it would call out
into the S3aCryptoAdmin and change the CLI invocation. While this may be a
bigger change now, renaming the command can start paving the way for Azure CSE
way down the road.
{quote}
About metadata persistence, does option #5 mean to add a new column to the S3G
DynamoDB table, to store the edeks along with the other metadata of a file? I
feel this is the safest way because we don't have to worry about consistency
particular to edeks.
{quote}
No, it is not meant to be part of S3Guard as it has the ability to delete the
table and refresh it. Doing so would cause loss of EDEK's and therefore data
loss.
{quote}
How is the EZ information stored?
{quote}
It is stored as BEZI. Simarily, the NN could read the BEZI or S3a could cache
that as well.
{quote}
What's the behavior when KMS is slow and we create a file? If the generateEDEK
exhausted cache, where does it hang? When this happens does it impact other
operation? (This is a pain for NN due to the single namespace lock, may not be
an issue for s3a but I'm curious)
{quote}
S3a would have to wait for the KMS to generate an EDEK. This I'm not sure how
it would affect other operations.
{quote}
I saw raw bytes mentioned. For hdfs, the way to access raw bytes is through a
special path: /.reserved/raw/original_path. How is this done in s3?
{quote}
That piece I'm not too sure yet about. It could be a virtual object key (such
as bucket-name/.reserved/raw/original_path) that isn't actually created in S3
and interpreted on a copy or move command to not decrypt data. I would like to
include it as potentially users could use DistCP to copy encrypted data from an
HDFS EZ to a S3 EZ without having to decrypt it as they share the same EZK. I
think that feature would be a great thing to do. But this subject needs to be
fleshed out more.
> Encrypt S3A data client-side with Hadoop libraries & Hadoop KMS
> ---------------------------------------------------------------
>
> Key: HADOOP-15006
> URL: https://issues.apache.org/jira/browse/HADOOP-15006
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/s3, kms
> Reporter: Steve Moist
> Priority: Minor
> Attachments: S3-CSE Proposal.pdf
>
>
> This is for the proposal to introduce Client Side Encryption to S3 in such a
> way that it can leverage HDFS transparent encryption, use the Hadoop KMS to
> manage keys, use the `hdfs crypto` command line tools to manage encryption
> zones in the cloud, and enable distcp to copy from HDFS to S3 (and
> vice-versa) with data still encrypted.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]