steveloughran commented on PR #15171: URL: https://github.com/apache/iceberg/pull/15171#issuecomment-3990245139
I've been doing something close to this in #15428, but not the trying to cache all headers. the cache must first strip out all headers which haven't been signed, which can be done using the `SignedHeaders` header or X-Amz-SignedHeaders query param (if there are any cases where that is used). That way a single cached signature GET on a file of two different ranges will be met from cache. The current design doesn't handle signed headers changing, or an unsigned header from the first request being not needed in the second. for example, if a GET range:0-1000 is in the cache, you can't issue a GET without a range...instead the original range is always attached. This design will (correctly) avoid issuing invalid requests when a signed header has changed, but by caching all headers it will trigger cache misses and and a catalog signing request _even for headers which can be safely changed_. key example: if the cache has a GET of range 996-1000, a followup GET of range 900-1000 is gong to trigger a new request. As this is often the GET path of a parquet file footer read, caching all headers will hurt client performance as well as put a lot more load on the catalog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
