parthchandra commented on PR #4335: URL: https://github.com/apache/datafusion-comet/pull/4335#issuecomment-4480030753
Just wanted to add my 2 bits to the credentials refreshing bit. The credentials providers are going to be executed on each executor and each executor will essentially request credentials at the same time. When running on a very large scale, this has been seen to sometimes overwhelm credentials backends leading to _system-wide_ job failure. So caching the credentials at the executors makes sense, but it is generally better to refresh centrally and distribute the credentials. It makes sense for the _engine_ to do the refresh. For instance, in Spark, Kerberos delegation tokens are managed by Spark centrally in [DelegationTokenManager](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala) This does open the question of secure _distribution_ of the credentials. Broadcast on an insecure channel will not do. The credentials distribution needs TLS. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
