hevinhsu commented on code in PR #9294:
URL: https://github.com/apache/ozone/pull/9294#discussion_r2533062646


##########
hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/signature/StringToSignProducer.java:
##########
@@ -70,19 +80,47 @@ public final class StringToSignProducer {
       DateTimeFormatter.ofPattern("yyyyMMdd'T'HHmmss'Z'")
           .withZone(ZoneOffset.UTC);
 
+  private static final Set<String> VALID_UNSIGNED_PAYLOADS;
+
+  static {
+    Set<String> set = new HashSet<>();
+    set.add(UNSIGNED_PAYLOAD);
+    set.add(STREAMING_UNSIGNED_PAYLOAD_TRAILER);
+    set.add(STREAMING_AWS4_HMAC_SHA256_PAYLOAD);
+    set.add(STREAMING_AWS4_HMAC_SHA256_PAYLOAD_TRAILER);
+    set.add(STREAMING_AWS4_ECDSA_P256_SHA256_PAYLOAD);
+    set.add(STREAMING_AWS4_ECDSA_P256_SHA256_PAYLOAD_TRAILER);
+    VALID_UNSIGNED_PAYLOADS = Collections.unmodifiableSet(set);
+  }
+
   private StringToSignProducer() {
   }
 
   public static String createSignatureBase(
       SignatureInfo signatureInfo,
       ContainerRequestContext context
   ) throws Exception {
-    return createSignatureBase(signatureInfo,
+    LowerCaseKeyStringMap lowerCaseKeyStringMap = 
LowerCaseKeyStringMap.fromHeaderMap(context.getHeaders());
+    String signatureBase = createSignatureBase(signatureInfo,
         context.getUriInfo().getRequestUri().getScheme(),
         context.getMethod(),
-        LowerCaseKeyStringMap.fromHeaderMap(context.getHeaders()),
+        lowerCaseKeyStringMap,
         fromMultiValueToSingleValueMap(
             context.getUriInfo().getQueryParameters()));
+
+    // validate x-amz-content-sha256
+    String payloadHash = getPayloadHash(lowerCaseKeyStringMap, 
!signatureInfo.isSignPayload());
+    if (!VALID_UNSIGNED_PAYLOADS.contains(payloadHash)) {
+      byte[] payload = IOUtils.toByteArray(context.getEntityStream());
+      context.setEntityStream(new ByteArrayInputStream(payload));

Review Comment:
   Thanks @ivandika3 for the detailed feedback and great reference!
   
   My initial idea is to introduce a `MultiDigestInputStream` to handle 
multiple digests in a single pass.  
   We can maintain a `Map<String, MessageDigest>` internally to keep track of 
digest instances by name (e.g., "MD5", "SHA-256").
   
   This way, we can reuse the same stream for both ETag (MD5) and SHA-256 
validation, and later call `multiDigestStream.getDigest("SHA-256")` to retrieve 
the computed value.
   
   For validation, I plan to add the check around 
[`ObjectEndpoint.java#L325`](https://github.com/apache/ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java#L325)
 — after the stream is fully consumed and closed — and compare it with the 
`x-amz-content-sha256` header.  
   If they don’t match, throw an exception and abort the commit, similar to the 
approach in [PR 
#8506](https://github.com/apache/ozone/pull/8506/files#diff-bd6abd262ed8e24eaeea5cb0e7c2a1a128411b284f9f27dae1ad47be6717a940R346-R351).
   
   This should allow:
   - Streaming processing without loading the entire payload into memory 
(avoiding OOM)
   - Reusing the existing MD5 logic
   
   What do you think about this direction? Any concerns or better integration 
points?  
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to