[ 
https://issues.apache.org/jira/browse/HADOOP-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234156#comment-16234156
 ] 

Steve Loughran commented on HADOOP-13887:
-----------------------------------------

This initial patch was just about turning client-side encryption on. Doing that 
makes for data whose EOF may be slightly less than len(block) which will break 
all client code which navigates off EOF, assumes the length of the data is the 
amount it can copy, etc. etc. And if you lose the key, you are on your own.

At the same time, I can see the appeal of some form of support for this purely 
for some backup/restore process, e.g. for encrypting data before -> glacier, 
decrypting it as part of a copy. I think that can/should be done outside the 
s3a lib you can never reliably use client-side encrypted S3 data as a source in 
any MR, Hive, Tez, Spark &c operation. People will end up encrypting their 
data, then be filing bugs/support calls trying to understand why their queries 
are all failing.

*Proposed*: change title of JIRA to "Encrypt S3A data client-side with AWS 
SDK", to make clear goal, then close as a wontfix with a clear explanation. 
It's not that we can't take on code that Igor has done, it's that the 
assumption that EOF=Len(file) is so fundamental, we can't give it to downstream 
code and expect them to handle it.

The other grand proposal is, well, big. And as it goes near KMS & encryption, 
beyond my scope. It also isn't going to interact with any other S3 client, 
which is a significant limitation. I'm certainly not going to go near it, and I 
wouldn't be in a place to review any but the "how does this glue to the input 
stream" issue. And even there fear would generally keep me away from it.

*Proposed*: create a new JIRA., "Encrypt S3A data client-side with Hadoop 
libraries & Hadoop KMS", put that proposal, and for now, let people comment on 
the proposal & see where it goes. 



> Support for client-side encryption in S3A file system
> -----------------------------------------------------
>
>                 Key: HADOOP-13887
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13887
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Jeeyoung Kim
>            Assignee: Igor Mazur
>            Priority: Minor
>         Attachments: HADOOP-13887-002.patch, HADOOP-13887-007.patch, 
> HADOOP-13887-branch-2-003.patch, HADOOP-13897-branch-2-004.patch, 
> HADOOP-13897-branch-2-005.patch, HADOOP-13897-branch-2-006.patch, 
> HADOOP-13897-branch-2-008.patch, HADOOP-13897-branch-2-009.patch, 
> HADOOP-13897-branch-2-010.patch, HADOOP-13897-branch-2-012.patch, 
> HADOOP-13897-branch-2-014.patch, HADOOP-13897-trunk-011.patch, 
> HADOOP-13897-trunk-013.patch, HADOOP-14171-001.patch, S3-CSE Proposal.pdf
>
>
> Expose the client-side encryption option documented in Amazon S3 
> documentation  - 
> http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html
> Currently this is not exposed in Hadoop but it is exposed as an option in AWS 
> Java SDK, which Hadoop currently includes. It should be trivial to propagate 
> this as a parameter passed to the S3client used in S3AFileSystem.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to