steveloughran commented on PR #15171:
URL: https://github.com/apache/iceberg/pull/15171#issuecomment-3997316394

   The current cache key strips x-amz-date (a defence against replay attacks we 
need to turn off here), range, and two headers the SDK attaches just for AWS 
request debugging: amz-sdk-invocation-id (per sdk operation uuid) and 
amz-sdk-retry (incrementing count of attemps that sdk operation was made). 
Looks designed to support the repeated ranged GET calls parquet and ORC reads 
make of the same file, as well as split reads of avro files. 
   
   Minimises cost to catalog of this heavy IO and on the client the delays of 
the requests. 
   
   But as you note: the cache key doesn't include extra stuff that the sdk may 
add now and in future `x-amz-content-sha256`, s3express session ref & probably 
more. Some of which may end up signed and which needs to be included unchanged 
in the second request.
   
   Have you hit any specific ones causing replay issues? I know things will not 
work using this signer under the s3a connector (referer gets audit info, but 
also on GET there's version id/etag as we seek round a file for resilience 
against/detection of overwrites). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to