ivandika3 commented on code in PR #9294:
URL: https://github.com/apache/ozone/pull/9294#discussion_r2529464503


##########
hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/signature/StringToSignProducer.java:
##########
@@ -70,19 +80,47 @@ public final class StringToSignProducer {
       DateTimeFormatter.ofPattern("yyyyMMdd'T'HHmmss'Z'")
           .withZone(ZoneOffset.UTC);
 
+  private static final Set<String> VALID_UNSIGNED_PAYLOADS;
+
+  static {
+    Set<String> set = new HashSet<>();
+    set.add(UNSIGNED_PAYLOAD);
+    set.add(STREAMING_UNSIGNED_PAYLOAD_TRAILER);
+    set.add(STREAMING_AWS4_HMAC_SHA256_PAYLOAD);
+    set.add(STREAMING_AWS4_HMAC_SHA256_PAYLOAD_TRAILER);
+    set.add(STREAMING_AWS4_ECDSA_P256_SHA256_PAYLOAD);
+    set.add(STREAMING_AWS4_ECDSA_P256_SHA256_PAYLOAD_TRAILER);
+    VALID_UNSIGNED_PAYLOADS = Collections.unmodifiableSet(set);
+  }
+
   private StringToSignProducer() {
   }
 
   public static String createSignatureBase(
       SignatureInfo signatureInfo,
       ContainerRequestContext context
   ) throws Exception {
-    return createSignatureBase(signatureInfo,
+    LowerCaseKeyStringMap lowerCaseKeyStringMap = 
LowerCaseKeyStringMap.fromHeaderMap(context.getHeaders());
+    String signatureBase = createSignatureBase(signatureInfo,
         context.getUriInfo().getRequestUri().getScheme(),
         context.getMethod(),
-        LowerCaseKeyStringMap.fromHeaderMap(context.getHeaders()),
+        lowerCaseKeyStringMap,
         fromMultiValueToSingleValueMap(
             context.getUriInfo().getQueryParameters()));
+
+    // validate x-amz-content-sha256
+    String payloadHash = getPayloadHash(lowerCaseKeyStringMap, 
!signatureInfo.isSignPayload());
+    if (!VALID_UNSIGNED_PAYLOADS.contains(payloadHash)) {
+      byte[] payload = IOUtils.toByteArray(context.getEntityStream());
+      context.setEntityStream(new ByteArrayInputStream(payload));

Review Comment:
   > I've looked into this, and it seems that avoiding OOM would usually 
require writing the request body to a temp file.
   
   > I'm not sure if that's the right approach here. If we want to decide based 
on file size, we might need a new config parameter, which adds extra complexity.
   
   From the HDDS-13668 ticket description, my idea is that we need to wrap 
`DigestInputStream` that calculates SHA-256 on top of the existing 
DigestInputStream that calculates the md5 hash in streaming fashion instead of 
the data twice.
   
   Afterwards in `KeyOutputStream#close` or `KeyOutputStream#closeInternal`, we 
can run `MessageDigest#digest` can validate with the "x-amz-content-sha256" 
header. If the digest does not match, we throw an exception without committing 
the key. The idea is similar to https://github.com/apache/ozone/pull/8506.
   
   Therefore, the digest calculation should not be located in 
`StringToSignProducer`.
   
   Please let me know what you think. I can help come up with some 
implementation when I have time.
   
   > Not sure if handling this in a memory-safe way should be part of this PR 
or left for future improvement.
   
   I think we should address it here. For large objects, loading the data to 
memory will easily cause OOM which causes S3G to be unstable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to