bryanck commented on PR #8555:
URL: https://github.com/apache/iceberg/pull/8555#issuecomment-1722157440

   > @bryanck I have 2 questions to understand the problem here
   > 
   > 1. how is this handled in Spark? I assume credentials could be valid for a 
few hours and a Spark job can run for 10+ hours. so this problem also applies 
to Spark.
   > 2. can the credentials refresh capability be baked into `FileIO`?
   
   Hi Steven, because the table is refreshed before each query in Spark, it 
would only impact queries running longer than credential expiration, and that 
hasn't been a issue for us specifically. It might be something that needs to be 
addressed at some point.
   
   We considered baking something into the FileIO. We felt that adding table 
refresh capability in the writers is already something that has been requested 
for a while, can provide credential and other configuration updates, and the 
REST API is already spec'd and available.
   
   Implemenation-wise, a centralized solution for refreshing the table metadata 
is ideal. The implementation here, where the subtasks perform the refresh, 
still is works for many cases. If it is a point of contention, we can remove 
the implementation (ReloadingTableSupplier) but keep the ability to set one.
   
   Implementation-wise, a FileIO-specific API for credential refresh will still 
add load to an Iceberg REST implementation. Table load with the REST API and 
refs-only snapshot mode can be fairly efficient.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to