[
https://issues.apache.org/jira/browse/HADOOP-13887?focusedWorklogId=622235&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622235
]
ASF GitHub Bot logged work on HADOOP-13887:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 13/Jul/21 20:58
Start Date: 13/Jul/21 20:58
Worklog Time Spent: 10m
Work Description: steveloughran commented on a change in pull request
#2706:
URL: https://github.com/apache/hadoop/pull/2706#discussion_r669093714
##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
##########
@@ -2029,3 +2029,51 @@ signer for all services.
For a specific service, the service specific signer is looked up first.
If that is not specified, the common signer is looked up. If this is
not specified as well, SDK settings are used.
+
+### <a name="cse"></a> Amazon S3 Client Side Encryption
+Amazon S3 Client Side Encryption(CSE), uses `AmazonS3EncryptionClientV2.java
+` AmazonS3 client. The encryption and decryption is done in AWS SDK side. Atm
Review comment:
"atm?"
Recommend an intro, features and limitations section:
A key reason this feature (HADOOP-13887) has been unavailable for a long
time is that the AWS S3 client pads uploaded objects with a 16 byte footer.
This meant that files were shorter when being read that when are listed them
through any of the list API calls/getFileStatus(). Which broke many
applications, including anything seeking near the end of a file to read a
footer, as ORC and Parquet do.
There is now a workaround: compensate for the footer in listings when CSE is
enabled.
* when listing files and directories, 16 bytes are subtracted from the
length of all non-empty objects.
* directory markers MAY be longer than 0 bytes long.
This "appears" to work; secondly it does in the testing to date. However,
the length of files when listed through the S3A client is now going to be
shorter than the length of files listed with other clients -including S3A
clients where S3-CSE has not been enabled.
features
* Supports client side encryption with keys managed in AWS KMS
* encryption settings propagated into jobs through any issued delegation
tokens
* encryption information stored as headers in the uploaded object
Limitations
* performance will be reduced. All encrypt/decrypt is now being done on the
client.
* writing files may be slower, as only a single block can be encrypted and
uploaded at a time
* Multipart Uploader API disabled
##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
##########
@@ -2029,3 +2029,51 @@ signer for all services.
For a specific service, the service specific signer is looked up first.
If that is not specified, the common signer is looked up. If this is
not specified as well, SDK settings are used.
+
+### <a name="cse"></a> Amazon S3 Client Side Encryption
Review comment:
move to encryption.md
##########
File path:
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java
##########
@@ -112,15 +124,73 @@ public AmazonS3 createS3Client(
}
try {
- return buildAmazonS3Client(
- awsConf,
- parameters);
+ if (conf.get(CLIENT_SIDE_ENCRYPTION_METHOD) == null) {
+ return buildAmazonS3Client(
+ awsConf,
+ parameters);
+ } else {
+ return buildAmazonS3EncryptionClient(
+ awsConf,
+ parameters);
+ }
} catch (SdkClientException e) {
// SDK refused to build.
throw translateException("creating AWS S3 client", uri.toString(), e);
}
}
+ /**
+ * Create an {@link AmazonS3} client of type
+ * {@link AmazonS3EncryptionV2} if CSE is enabled.
+ *
+ * @param awsConf AWS configuration.
+ * @param parameters parameters
+ *
+ * @return new AmazonS3 client.
+ */
+ protected AmazonS3 buildAmazonS3EncryptionClient(
Review comment:
InconsistentS3Client doesn't do this; I can see tests skip it. Should we
have that client throw some Unsupported Exception here
##########
File path:
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractCommitITest.java
##########
@@ -179,11 +178,9 @@ public void setup() throws Exception {
if (useInconsistentClient()) {
AmazonS3 client = getFileSystem()
.getAmazonS3ClientForTesting("fault injection");
- Assert.assertTrue(
- "AWS client is not inconsistent, even though the test requirees it "
- + client,
- client instanceof InconsistentAmazonS3Client);
- inconsistentClient = (InconsistentAmazonS3Client) client;
+ if(client instanceof InconsistentAmazonS3Client) {
Review comment:
nit: add space after if
##########
File path:
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java
##########
@@ -43,9 +52,12 @@
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.s3a.statistics.impl.AwsStatisticsCollector;
import org.apache.hadoop.fs.store.LogExactlyOnce;
+import org.apache.hadoop.thirdparty.com.google.common.base.Preconditions;
Review comment:
can you move up to L44. Yes, it its a PITA.
##########
File path:
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
##########
@@ -242,6 +242,7 @@ protected void logTimePerIOP(String operation,
@Test
public void testTimeToOpenAndReadWholeFileBlocks() throws Throwable {
+ skipIfClientSideEncryption();
Review comment:
why are we turning this off ? Performance? seek() perf?
##########
File path:
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md
##########
@@ -1682,3 +1682,105 @@ com.amazonaws.SdkClientException: Unable to execute
HTTP request:
When this happens, try to set `fs.s3a.connection.request.timeout` to a larger
value or disable it
completely by setting it to `0`.
+
+### <a name="client-side-encryption"></a> S3 Client Side Encryption
Review comment:
The troubleshooting documents need to assume that the person reading
them or googling for them has just seen a specific error message and is trying
to work out what's wrong/what to do.
This means the phrasing here needs to be tweaked slightly
* move these into the "encryption" section (ahead of those which clearly
shouldn't be there :)
* Add a ### subtitle for every stack trace
* with the error text the title of it -so it's the first thing people see
* interpretation after the stack trace
* and ideally a recommendation, if there's a simple solution (including
"don't")
### "Instruction file not found for S3 object"
Reading a file fails when the client is configured with S3-CSE
(insert <stack trace>)
The file is not encrypted with S3-CSE.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 622235)
Time Spent: 6.5h (was: 6h 20m)
> Encrypt S3A data client-side with AWS SDK (S3-CSE)
> --------------------------------------------------
>
> Key: HADOOP-13887
> URL: https://issues.apache.org/jira/browse/HADOOP-13887
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Jeeyoung Kim
> Assignee: Igor Mazur
> Priority: Minor
> Labels: pull-request-available
> Attachments: HADOOP-13887-002.patch, HADOOP-13887-007.patch,
> HADOOP-13887-branch-2-003.patch, HADOOP-13897-branch-2-004.patch,
> HADOOP-13897-branch-2-005.patch, HADOOP-13897-branch-2-006.patch,
> HADOOP-13897-branch-2-008.patch, HADOOP-13897-branch-2-009.patch,
> HADOOP-13897-branch-2-010.patch, HADOOP-13897-branch-2-012.patch,
> HADOOP-13897-branch-2-014.patch, HADOOP-13897-trunk-011.patch,
> HADOOP-13897-trunk-013.patch, HADOOP-14171-001.patch, S3-CSE Proposal.pdf
>
> Time Spent: 6.5h
> Remaining Estimate: 0h
>
> Expose the client-side encryption option documented in Amazon S3
> documentation -
> http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html
> Currently this is not exposed in Hadoop but it is exposed as an option in AWS
> Java SDK, which Hadoop currently includes. It should be trivial to propagate
> this as a parameter passed to the S3client used in S3AFileSystem.java
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]