plusplusjiajia commented on PR #3120: URL: https://github.com/apache/iceberg-python/pull/3120#issuecomment-4102841635
> thanks for the PR! i understand this is to align with the java SigV4 implementation. Could you help me understand the specific scenario in which this is currently breaking? (i dont know much about sigv4) That's a great question — let me walk through the context. The root cause is in how the Java Iceberg SDK computes the x-amz-content-sha256 header. It uses AWS SDK v2's SignerChecksumParams with Algorithm.SHA256 and sets the checksumHeaderName to [X-Amz-Content-SHA256 ](https://github.com/apache/iceberg/blob/fec9800bc/aws/src/main/java/org/apache/iceberg/aws/RESTSigV4AuthSession.java#L100-L104). Internally, the [AWS SDK's]( https://github.com/aws/aws-sdk-java-v2/blob/master/core/auth/src/main/java/software/amazon/awssdk/auth/signer/internal/AbstractAws4Signer.java) applies BinaryUtils.toBase64() to the checksum before writing it into the specified header — this is part of the flexible checksum mechanism rather than standard SigV4 behavior. So the base64 encoding in x-amz-content-sha256 is essentially a side effect of Java Iceberg leveraging the flexible checksum API. For empty bodies, the Java side already has a [RESTSigV4AuthSession.java#L119-L121](https://github.com/apache/iceberg/blob/fec9800bc/aws/src/main/java/org/apache/iceberg/aws/RESTSigV4AuthSession.java#L119-L121) to override this with the standard hex value, but for non-empty bodies, the base64 value is left as-is (confirmed by the [TestRESTSigV4AuthSession.java#L174 ](https://github.com/apache/iceberg/blob/a89f1f9aa/aws/src/test/java/org/apache/iceberg/aws/TestRESTSigV4AuthSession.java#L174)). Since x-amz-content-sha256 is a signed header, its value participates in the canonical request construction. When a REST catalog server built with the Java Iceberg SDK verifies incoming signatures, it expects the same base64-encoded value. If the Python client sends a hex-encoded value instead, the canonical headers won't match during server-side signature verification, resulting in a signature mismatch. This PR aligns the Python implementation with the Java SDK's current behavior to ensure interoperability. That said, I agree it would be worth discussing whether the Java side should also be updated to use standard hex encoding — but that would need to be a coordinated change across both implementations. Happy to hear your thoughts on this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
