[
https://issues.apache.org/jira/browse/HADOOP-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234156#comment-16234156
]
Steve Loughran commented on HADOOP-13887:
-----------------------------------------
This initial patch was just about turning client-side encryption on. Doing that
makes for data whose EOF may be slightly less than len(block) which will break
all client code which navigates off EOF, assumes the length of the data is the
amount it can copy, etc. etc. And if you lose the key, you are on your own.
At the same time, I can see the appeal of some form of support for this purely
for some backup/restore process, e.g. for encrypting data before -> glacier,
decrypting it as part of a copy. I think that can/should be done outside the
s3a lib you can never reliably use client-side encrypted S3 data as a source in
any MR, Hive, Tez, Spark &c operation. People will end up encrypting their
data, then be filing bugs/support calls trying to understand why their queries
are all failing.
*Proposed*: change title of JIRA to "Encrypt S3A data client-side with AWS
SDK", to make clear goal, then close as a wontfix with a clear explanation.
It's not that we can't take on code that Igor has done, it's that the
assumption that EOF=Len(file) is so fundamental, we can't give it to downstream
code and expect them to handle it.
The other grand proposal is, well, big. And as it goes near KMS & encryption,
beyond my scope. It also isn't going to interact with any other S3 client,
which is a significant limitation. I'm certainly not going to go near it, and I
wouldn't be in a place to review any but the "how does this glue to the input
stream" issue. And even there fear would generally keep me away from it.
*Proposed*: create a new JIRA., "Encrypt S3A data client-side with Hadoop
libraries & Hadoop KMS", put that proposal, and for now, let people comment on
the proposal & see where it goes.
> Support for client-side encryption in S3A file system
> -----------------------------------------------------
>
> Key: HADOOP-13887
> URL: https://issues.apache.org/jira/browse/HADOOP-13887
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Jeeyoung Kim
> Assignee: Igor Mazur
> Priority: Minor
> Attachments: HADOOP-13887-002.patch, HADOOP-13887-007.patch,
> HADOOP-13887-branch-2-003.patch, HADOOP-13897-branch-2-004.patch,
> HADOOP-13897-branch-2-005.patch, HADOOP-13897-branch-2-006.patch,
> HADOOP-13897-branch-2-008.patch, HADOOP-13897-branch-2-009.patch,
> HADOOP-13897-branch-2-010.patch, HADOOP-13897-branch-2-012.patch,
> HADOOP-13897-branch-2-014.patch, HADOOP-13897-trunk-011.patch,
> HADOOP-13897-trunk-013.patch, HADOOP-14171-001.patch, S3-CSE Proposal.pdf
>
>
> Expose the client-side encryption option documented in Amazon S3
> documentation -
> http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html
> Currently this is not exposed in Hadoop but it is exposed as an option in AWS
> Java SDK, which Hadoop currently includes. It should be trivial to propagate
> this as a parameter passed to the S3client used in S3AFileSystem.java
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]