hevinhsu commented on code in PR #9294:
URL: https://github.com/apache/ozone/pull/9294#discussion_r2533062646
##########
hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/signature/StringToSignProducer.java:
##########
@@ -70,19 +80,47 @@ public final class StringToSignProducer {
DateTimeFormatter.ofPattern("yyyyMMdd'T'HHmmss'Z'")
.withZone(ZoneOffset.UTC);
+ private static final Set<String> VALID_UNSIGNED_PAYLOADS;
+
+ static {
+ Set<String> set = new HashSet<>();
+ set.add(UNSIGNED_PAYLOAD);
+ set.add(STREAMING_UNSIGNED_PAYLOAD_TRAILER);
+ set.add(STREAMING_AWS4_HMAC_SHA256_PAYLOAD);
+ set.add(STREAMING_AWS4_HMAC_SHA256_PAYLOAD_TRAILER);
+ set.add(STREAMING_AWS4_ECDSA_P256_SHA256_PAYLOAD);
+ set.add(STREAMING_AWS4_ECDSA_P256_SHA256_PAYLOAD_TRAILER);
+ VALID_UNSIGNED_PAYLOADS = Collections.unmodifiableSet(set);
+ }
+
private StringToSignProducer() {
}
public static String createSignatureBase(
SignatureInfo signatureInfo,
ContainerRequestContext context
) throws Exception {
- return createSignatureBase(signatureInfo,
+ LowerCaseKeyStringMap lowerCaseKeyStringMap =
LowerCaseKeyStringMap.fromHeaderMap(context.getHeaders());
+ String signatureBase = createSignatureBase(signatureInfo,
context.getUriInfo().getRequestUri().getScheme(),
context.getMethod(),
- LowerCaseKeyStringMap.fromHeaderMap(context.getHeaders()),
+ lowerCaseKeyStringMap,
fromMultiValueToSingleValueMap(
context.getUriInfo().getQueryParameters()));
+
+ // validate x-amz-content-sha256
+ String payloadHash = getPayloadHash(lowerCaseKeyStringMap,
!signatureInfo.isSignPayload());
+ if (!VALID_UNSIGNED_PAYLOADS.contains(payloadHash)) {
+ byte[] payload = IOUtils.toByteArray(context.getEntityStream());
+ context.setEntityStream(new ByteArrayInputStream(payload));
Review Comment:
Thanks @ivandika3 for the detailed feedback and great reference!
My initial idea is to introduce a `MultiDigestInputStream` to handle
multiple digests in a single pass.
We can maintain a `Map<String, MessageDigest>` internally to keep track of
digest instances by name (e.g., "MD5", "SHA-256").
This way, we can reuse the same stream for both ETag (MD5) and SHA-256
validation, and later call `multiDigestStream.getDigest("SHA-256")` to retrieve
the computed value.
For validation, I plan to add the check around
[`ObjectEndpoint.java#L325`](https://github.com/apache/ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java#L325)
— after the stream is fully consumed and closed — and compare it with the
`x-amz-content-sha256` header.
If they don’t match, throw an exception and abort the commit, similar to the
approach in [PR
#8506](https://github.com/apache/ozone/pull/8506/files#diff-bd6abd262ed8e24eaeea5cb0e7c2a1a128411b284f9f27dae1ad47be6717a940R346-R351).
This should allow:
- Streaming processing without loading the entire payload into memory
(avoiding OOM)
- Reusing the existing MD5 logic
What do you think about this direction? Any concerns or better integration
points?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]