[
https://issues.apache.org/jira/browse/HADOOP-13887?focusedWorklogId=624631&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-624631
]
ASF GitHub Bot logged work on HADOOP-13887:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 19/Jul/21 20:38
Start Date: 19/Jul/21 20:38
Worklog Time Spent: 10m
Work Description: steveloughran commented on a change in pull request
#2706:
URL: https://github.com/apache/hadoop/pull/2706#discussion_r672614095
##########
File path:
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md
##########
@@ -70,6 +76,53 @@ by Amazon's Key Management Service, a key referenced by name
in the uploading cl
* SSE-C : the client specifies an actual base64 encoded AES-256 key to be used
to encrypt and decrypt the data.
+Encryption options
+
+| type | encryption | config on write | config on read |
+|-------|---------|-----------------|----------------|
+| `SSE-S3` | server side, AES256 | encryption algorithm | none |
+| `SSE-KMS` | server side, KMS key | key used to encrypt/decrypt | none |
+| `SSE-C` | server side, custom key | encryption algorithm and secret |
encryption algorithm and secret |
+| `CSE-KMS` | client side, KMS key | encryption algorithm and key ID |
encryption algorithm |
+
+With server-side encryption, the data is uploaded to S3 unencrypted (but
wrapped by the HTTPS
+encryption channel).
+The data is dynamically encrypted and decrypted in the S3 Store, as needed.
+
+A server side algorithm can be enabled by default for a bucket, so that
+whenever data is uploaded unencrypted a default encryption algorithm is added.
+When data is encrypted with S3-SSE or SSE-KMS it is transparent to all clients
+downloading the data.
+SSE-C is different in that every client must know the secret key needed to
decypt the data.
+
+Working with SSE-C data hard because every client must be configured to use the
+algorithm and supply the key. In particular, it is very hard to mix SSE-C
+encrypted objects in the same S3 bucket with objects encrypted with other
+algorithms or unencrypted; The S3A client
+(and other applications) get very confused.
+
+KMS-based key encryption is powerful as access to a key can be restricted to
+specific users/IAM roles. However, use of the key is billed and can be
+throttled. Furthermore as a client seeks around a file, the KMS key *may* be
Review comment:
we've not done anything with client side keys. probably relevant on
random IO where we do many GET's across a single input stream, so could cache
it there.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 624631)
Time Spent: 9.5h (was: 9h 20m)
> Encrypt S3A data client-side with AWS SDK (S3-CSE)
> --------------------------------------------------
>
> Key: HADOOP-13887
> URL: https://issues.apache.org/jira/browse/HADOOP-13887
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Jeeyoung Kim
> Assignee: Igor Mazur
> Priority: Minor
> Labels: pull-request-available
> Attachments: HADOOP-13887-002.patch, HADOOP-13887-007.patch,
> HADOOP-13887-branch-2-003.patch, HADOOP-13897-branch-2-004.patch,
> HADOOP-13897-branch-2-005.patch, HADOOP-13897-branch-2-006.patch,
> HADOOP-13897-branch-2-008.patch, HADOOP-13897-branch-2-009.patch,
> HADOOP-13897-branch-2-010.patch, HADOOP-13897-branch-2-012.patch,
> HADOOP-13897-branch-2-014.patch, HADOOP-13897-trunk-011.patch,
> HADOOP-13897-trunk-013.patch, HADOOP-14171-001.patch, S3-CSE Proposal.pdf
>
> Time Spent: 9.5h
> Remaining Estimate: 0h
>
> Expose the client-side encryption option documented in Amazon S3
> documentation -
> http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html
> Currently this is not exposed in Hadoop but it is exposed as an option in AWS
> Java SDK, which Hadoop currently includes. It should be trivial to propagate
> this as a parameter passed to the S3client used in S3AFileSystem.java
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]