bryanck commented on PR #8555: URL: https://github.com/apache/iceberg/pull/8555#issuecomment-1722157440
> @bryanck I have 2 questions to understand the problem here > > 1. how is this handled in Spark? I assume credentials could be valid for a few hours and a Spark job can run for 10+ hours. so this problem also applies to Spark. > 2. can the credentials refresh capability be baked into `FileIO`? Hi Steven, because the table is refreshed before each query in Spark, it would only impact queries running longer than credential expiration, and that hasn't been a issue for us specifically. It might be something that needs to be addressed at some point. We considered baking something into the FileIO. We felt that adding table refresh capability in the writers is already something that has been requested for a while, can provide credential and other configuration updates, and the REST API is already spec'd and available. Implemenation-wise, a centralized solution for refreshing the table metadata is ideal. The implementation here, where the subtasks perform the refresh, still is works for many cases. If it is a point of contention, we can remove the implementation (ReloadingTableSupplier) but keep the ability to set one. Implementation-wise, a FileIO-specific API for credential refresh will still add load to an Iceberg REST implementation. Table load with the REST API and refs-only snapshot mode can be fairly efficient. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
