ivandika3 commented on code in PR #9294:
URL: https://github.com/apache/ozone/pull/9294#discussion_r2529464503


##########
hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/signature/StringToSignProducer.java:
##########
@@ -70,19 +80,47 @@ public final class StringToSignProducer {
       DateTimeFormatter.ofPattern("yyyyMMdd'T'HHmmss'Z'")
           .withZone(ZoneOffset.UTC);
 
+  private static final Set<String> VALID_UNSIGNED_PAYLOADS;
+
+  static {
+    Set<String> set = new HashSet<>();
+    set.add(UNSIGNED_PAYLOAD);
+    set.add(STREAMING_UNSIGNED_PAYLOAD_TRAILER);
+    set.add(STREAMING_AWS4_HMAC_SHA256_PAYLOAD);
+    set.add(STREAMING_AWS4_HMAC_SHA256_PAYLOAD_TRAILER);
+    set.add(STREAMING_AWS4_ECDSA_P256_SHA256_PAYLOAD);
+    set.add(STREAMING_AWS4_ECDSA_P256_SHA256_PAYLOAD_TRAILER);
+    VALID_UNSIGNED_PAYLOADS = Collections.unmodifiableSet(set);
+  }
+
   private StringToSignProducer() {
   }
 
   public static String createSignatureBase(
       SignatureInfo signatureInfo,
       ContainerRequestContext context
   ) throws Exception {
-    return createSignatureBase(signatureInfo,
+    LowerCaseKeyStringMap lowerCaseKeyStringMap = 
LowerCaseKeyStringMap.fromHeaderMap(context.getHeaders());
+    String signatureBase = createSignatureBase(signatureInfo,
         context.getUriInfo().getRequestUri().getScheme(),
         context.getMethod(),
-        LowerCaseKeyStringMap.fromHeaderMap(context.getHeaders()),
+        lowerCaseKeyStringMap,
         fromMultiValueToSingleValueMap(
             context.getUriInfo().getQueryParameters()));
+
+    // validate x-amz-content-sha256
+    String payloadHash = getPayloadHash(lowerCaseKeyStringMap, 
!signatureInfo.isSignPayload());
+    if (!VALID_UNSIGNED_PAYLOADS.contains(payloadHash)) {
+      byte[] payload = IOUtils.toByteArray(context.getEntityStream());
+      context.setEntityStream(new ByteArrayInputStream(payload));

Review Comment:
   > I've looked into this, and it seems that avoiding OOM would usually 
require writing the request body to a temp file.
   
   > I'm not sure if that's the right approach here. If we want to decide based 
on file size, we might need a new config parameter, which adds extra complexity.
   
   From the HDDS-13668 ticket description, my idea is that we need to wrap 
`DigestInputStream` that calculates SHA-256 on top of the existing 
`DigestInputStream` that calculates the md5 hash (for ETag) so that we can 
calculate the digest in streaming fashion instead loading the data twice.
   
   After all the data has been streamed to datanodes and we want to commit the 
key in `KeyOutputStream#close` or `KeyOutputStream#closeInternal`, we can run 
`MessageDigest#digest` can validate with the "x-amz-content-sha256" header. If 
the digest does not match, we throw an exception without committing the key. 
The idea is similar to https://github.com/apache/ozone/pull/8506. The key that 
is not committed will be cleaned in background after few days by the 
`OpenKeyCleanupService` so there should not be any orphan blocks.
   
   Therefore, the digest calculation should not be located in 
`StringToSignProducer`. I think the TODO in the `StringToSignProducer` is 
misleading (it was from long ago when the initial S3G initial implementation 
was created).
   
   Please let me know what you think. I can help come up with some 
implementation when I have time.
   
   > Not sure if handling this in a memory-safe way should be part of this PR 
or left for future improvement.
   
   I think we should address it here. For large objects, loading the data to 
memory will easily cause OOM which causes S3G to be unstable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to