steveloughran commented on PR #15171: URL: https://github.com/apache/iceberg/pull/15171#issuecomment-3997316394
The current cache key strips x-amz-date (a defence against replay attacks we need to turn off here), range, and two headers the SDK attaches just for AWS request debugging: amz-sdk-invocation-id (per sdk operation uuid) and amz-sdk-retry (incrementing count of attemps that sdk operation was made). Looks designed to support the repeated ranged GET calls parquet and ORC reads make of the same file, as well as split reads of avro files. Minimises cost to catalog of this heavy IO and on the client the delays of the requests. But as you note: the cache key doesn't include extra stuff that the sdk may add now and in future `x-amz-content-sha256`, s3express session ref & probably more. Some of which may end up signed and which needs to be included unchanged in the second request. Have you hit any specific ones causing replay issues? I know things will not work using this signer under the s3a connector (referer gets audit info, but also on GET there's version id/etag as we seek round a file for resilience against/detection of overwrites). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
