bryanck commented on PR #8555: URL: https://github.com/apache/iceberg/pull/8555#issuecomment-1722270134
> Normally, I would think table load (dozens or hundreds of milliseconds?) is probably a magnitude or two more expensive than credentials refresh (a few milliseconds?). This is where I share the same concern that @pvary raised about every task manager refreshing/loading tables periodically. > > Can you elaborate a little how REST catalog makes table load efficient? > We've implemented several optimizations in our catalog so table load responses are around 24ms, including our API auth checks. We have refs-only mode on by default. We're continuing work to make that even more efficient. > I also echo @pvary 's comment on the confusion btw `TableLoader` and `TableSupplier`. There are actually some questions on the design `TableLoader` (e.g. [not handling well close well](https://github.com/apache/iceberg/pull/6614)). I would be in favor a refactor/redesign of the table loader/catalog lifecycle in Flink. But that will probably be a big discussion by itself. I agree here. It is possible to reuse the TableLoader interface with a couple of changes if that seems acceptable. > > I also thought you use pre-signed urls from REST catalog. with that, what is the credentials expiration problem here? S3 session credentials also have auto-refresh capability. Hence, I am trying to understand more about the credentials expiration and why is it tied to table object and needs refresh. Refreshing credentials with the REST API can be done using OAuth2, which we use w/ the S3 signer, or using custom mechanism that passes credentials in the table config response, which are then passed to the FileIO during initialization. We use the latter mechanism for GCP tokens, Azure SAS tokens, S3 SigV4 exchange, and S3 vended credentials. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
