[
https://issues.apache.org/jira/browse/HADOOP-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-13887:
------------------------------------
Release Note:
Adds support for client side encryption in AWS S3,
with keys managed by AWS-KMS.
Read the documentation in encryption.md very, very carefully before
use and consider it unstable.
S3-CSE is enabled in the existing configuration option
"fs.s3a.server-side-encryption-algorithm":
fs.s3a.server-side-encryption-algorithm=CSE-KMS
fs.s3a.server-side-encryption.key=<KMS_KEY_ID>
You cannot enable CSE and SSE in the same client, although
you can still enable a default SSE option in the S3 console.
* Not compatible with S3Guard.
* Filesystem list/get status operations subtract 16 bytes from the length
of all files >= 16 bytes long to compensate for the padding which CSE
adds.
* The SDK always warns about the specific algorithm chosen being
deprecated. It is critical to use this algorithm for ranged
GET requests to work (i.e. random IO). Ignore.
* Unencrypted files CANNOT BE READ.
The entire bucket SHOULD be encrypted with S3-CSE.
* Uploading files may be a bit slower as blocks are now
written sequentially.
* The Multipart Upload API is disabled when S3-CSE is active.
was:
We cannot guarantee that client-side encryption works in applications which use
the Hadoop FileSystem APIs.
Client-side encryption breaks a fundamental expectation of lots of code: that
the amount of data you can read from a file equals the size of the file when
listed. Because of padding and roundup, you may get less data on read than you
think, seek(length(file)-1) can fail, read(bytes,length(file)) may fail, etc.
Be assured, code will break. That's going to be found as you use this feature,
and maybe it can be fixed. It may also be that it can't, not easily.
If you do find problems, you are going to have to take it up with the
particular application/library which has the issue, and see whether or not they
can/will fix it. We cannot fix it in the Hadoop S3A filesystem, as this code is
only reporting back the file lengths supplied by the S3 endpoint: if there is a
mismatch, the client code does not know of it. Indeed, it's likely that S3
itself doesn't know of it, because the decryption is taking place on the client.
For this reason, you cannot simply turn on client-side encryption and expect
everything to "just" work. You should restrict it to specific uses which you
can test, such as writing data out for use by an external application, or
carefully importing data uploaded by other processes.
> Encrypt S3A data client-side with AWS SDK (S3-CSE)
> --------------------------------------------------
>
> Key: HADOOP-13887
> URL: https://issues.apache.org/jira/browse/HADOOP-13887
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Jeeyoung Kim
> Assignee: Igor Mazur
> Priority: Minor
> Labels: pull-request-available
> Attachments: HADOOP-13887-002.patch, HADOOP-13887-007.patch,
> HADOOP-13887-branch-2-003.patch, HADOOP-13897-branch-2-004.patch,
> HADOOP-13897-branch-2-005.patch, HADOOP-13897-branch-2-006.patch,
> HADOOP-13897-branch-2-008.patch, HADOOP-13897-branch-2-009.patch,
> HADOOP-13897-branch-2-010.patch, HADOOP-13897-branch-2-012.patch,
> HADOOP-13897-branch-2-014.patch, HADOOP-13897-trunk-011.patch,
> HADOOP-13897-trunk-013.patch, HADOOP-14171-001.patch, S3-CSE Proposal.pdf
>
> Time Spent: 11h 40m
> Remaining Estimate: 0h
>
> Expose the client-side encryption option documented in Amazon S3
> documentation -
> http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html
> Currently this is not exposed in Hadoop but it is exposed as an option in AWS
> Java SDK, which Hadoop currently includes. It should be trivial to propagate
> this as a parameter passed to the S3client used in S3AFileSystem.java
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]