steveloughran commented on PR #15171:
URL: https://github.com/apache/iceberg/pull/15171#issuecomment-3990245139

   I've been doing something close to this in #15428, but not the trying to 
cache all headers. 
   
   the cache must first strip out all headers which haven't been signed, which 
can be done using the `SignedHeaders` header or X-Amz-SignedHeaders query param 
(if there are any cases where that is used).
   
   That way a single cached signature GET on a file of two different ranges 
will be met from cache. 
   
   The current design doesn't handle signed headers changing, or an unsigned 
header from the first request being not needed in the second. for example, if a 
GET range:0-1000 is in the cache, you can't issue a GET without a 
range...instead the original range is always attached.
   
   This design will (correctly) avoid issuing invalid requests when a signed 
header has changed, but by caching all headers it will trigger cache misses and 
and a catalog signing request _even for headers which can be safely changed_.
   
   key example: if the cache has a GET of range 996-1000, a followup GET of 
range 900-1000 is gong to trigger a new request. As this is often the GET path 
of a parquet file footer read, caching all headers will hurt client performance 
as well as put a lot more load on the catalog.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to