ivandika3 commented on code in PR #9294:
URL: https://github.com/apache/ozone/pull/9294#discussion_r2529464503
##########
hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/signature/StringToSignProducer.java:
##########
@@ -70,19 +80,47 @@ public final class StringToSignProducer {
DateTimeFormatter.ofPattern("yyyyMMdd'T'HHmmss'Z'")
.withZone(ZoneOffset.UTC);
+ private static final Set<String> VALID_UNSIGNED_PAYLOADS;
+
+ static {
+ Set<String> set = new HashSet<>();
+ set.add(UNSIGNED_PAYLOAD);
+ set.add(STREAMING_UNSIGNED_PAYLOAD_TRAILER);
+ set.add(STREAMING_AWS4_HMAC_SHA256_PAYLOAD);
+ set.add(STREAMING_AWS4_HMAC_SHA256_PAYLOAD_TRAILER);
+ set.add(STREAMING_AWS4_ECDSA_P256_SHA256_PAYLOAD);
+ set.add(STREAMING_AWS4_ECDSA_P256_SHA256_PAYLOAD_TRAILER);
+ VALID_UNSIGNED_PAYLOADS = Collections.unmodifiableSet(set);
+ }
+
private StringToSignProducer() {
}
public static String createSignatureBase(
SignatureInfo signatureInfo,
ContainerRequestContext context
) throws Exception {
- return createSignatureBase(signatureInfo,
+ LowerCaseKeyStringMap lowerCaseKeyStringMap =
LowerCaseKeyStringMap.fromHeaderMap(context.getHeaders());
+ String signatureBase = createSignatureBase(signatureInfo,
context.getUriInfo().getRequestUri().getScheme(),
context.getMethod(),
- LowerCaseKeyStringMap.fromHeaderMap(context.getHeaders()),
+ lowerCaseKeyStringMap,
fromMultiValueToSingleValueMap(
context.getUriInfo().getQueryParameters()));
+
+ // validate x-amz-content-sha256
+ String payloadHash = getPayloadHash(lowerCaseKeyStringMap,
!signatureInfo.isSignPayload());
+ if (!VALID_UNSIGNED_PAYLOADS.contains(payloadHash)) {
+ byte[] payload = IOUtils.toByteArray(context.getEntityStream());
+ context.setEntityStream(new ByteArrayInputStream(payload));
Review Comment:
> I've looked into this, and it seems that avoiding OOM would usually
require writing the request body to a temp file.
> I'm not sure if that's the right approach here. If we want to decide based
on file size, we might need a new config parameter, which adds extra complexity.
From the HDDS-13668 ticket description, my idea is that we need to wrap
`DigestInputStream` that calculates SHA-256 on top of the existing
DigestInputStream that calculates the md5 hash in streaming fashion instead of
the data twice.
Afterwards in `KeyOutputStream#close` or `KeyOutputStream#closeInternal`, we
can run `MessageDigest#digest` can validate with the "x-amz-content-sha256"
header.
Therefore, the digest calculation should not be located in
`StringToSignProducer`.
Please let me know what you think. I can help come up with some
implementation when I have time.
> Not sure if handling this in a memory-safe way should be part of this PR
or left for future improvement.
I think we should address it here. For large objects, loading the data to
memory will easily cause OOM which causes S3G to be unstable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]