bryanck commented on PR #8555:
URL: https://github.com/apache/iceberg/pull/8555#issuecomment-1722270134

   > Normally, I would think table load (dozens or hundreds of milliseconds?) 
is probably a magnitude or two more expensive than credentials refresh (a few 
milliseconds?). This is where I share the same concern that @pvary raised about 
every task manager refreshing/loading tables periodically.
   > 
   > Can you elaborate a little how REST catalog makes table load efficient?
   > 
   
   We've implemented several optimizations in our catalog so table load 
responses are around 24ms, including our API auth checks. We have refs-only 
mode on by default. We're continuing work to make that even more efficient.
   
   > I also echo @pvary 's comment on the confusion btw `TableLoader` and 
`TableSupplier`. There are actually some questions on the design `TableLoader` 
(e.g. [not handling well close 
well](https://github.com/apache/iceberg/pull/6614)). I would be in favor a 
refactor/redesign of the table loader/catalog lifecycle in Flink. But that will 
probably be a big discussion by itself.
   
   I agree here. It is possible to reuse the TableLoader interface with a 
couple of changes if that seems acceptable.
   
   > 
   > I also thought you use pre-signed urls from REST catalog. with that, what 
is the credentials expiration problem here? S3 session credentials also have 
auto-refresh capability. Hence, I am trying to understand more about the 
credentials expiration and why is it tied to table object and needs refresh.
   
   Refreshing credentials with the REST API can be done using OAuth2, which we 
use w/ the S3 signer, or using custom mechanism that passes credentials in the 
table config response, which are then passed to the FileIO during 
initialization. We use the latter mechanism for GCP tokens, Azure SAS tokens, 
S3 SigV4 exchange, and S3 vended credentials.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to