Hi Aleksandr,

Thanks for efforts!

I've missed this thread lately but have some thought/questions.

Up until now one cluster per one set of user credentials was the model. I think 
the multi-user
model better serves the needs so +1. We should mention this on the main doc 
page later.

Up until now DelegationTokenProvider instances were singletons and loaded by the
service loader. Now we plan to add stop function, does that mean we plan to 
change
the lifecycle?

Having a generic way to ask the delegation token manager to re-obtain is a long 
standing
needed feature but didn't have time. Having a dedicated API for this would be 
maybe
better instead of relying on registerJob return value.

Not sure sure how it's planned but new immediate re-obtain scheduling would be 
good to be
upper bounded. Some retry logic can be aggressive about re-registration. Or 
having a
cooldown is also fine.

Last but not least up until now there was a single thread which played on 
critical path on
immutable structures. Now we plan to change that which is fine but then I would 
like to see an
exact plan what kind of threads are doing what and how do we protect against
race/starvation/deadlock. Having an exact look is fine on the PR but this is 
the gist of it
from my perspective.
What I mean here specifically is that even if we schedule the renewal the 
existing way
at least the providers list manipulation and the originally scheduled renewal 
can race.
Maybe others since I can just imagine the change.

BR,
G


On 2026/06/05 16:35:15 Aleksandr Savonin wrote:
> Hi everyone,
> 
> Alan Sheinberg and I would like to start a discussion on FLIP-588:
> Support per-job delegation tokens [1].
> Flink's delegation token framework is currently cluster-scoped, which
> means a DelegationTokenProvider has no notion of an individual job.
> This breaks when different jobs on the same cluster need to
> authenticate as different identities to the same external service.
> To resolve this, the FLIP adds per-job lifecycle hooks
> (registerJob/unregisterJob/stop) as default methods on the
> DelegationTokenProvider SPI, along with the runtime wiring to invoke
> them on job start and stop.
> This change is fully backward compatible (new methods are default
> no-ops). It is worth mentioning that it widens the internal
> registerJobMaster RPC to carry the job configuration.
> 
> Looking forward to your feedback.
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-588%3A+Support+per-job+delegation+tokens
> 
> -- 
> Kind regards,
> Aleksandr
> 

Reply via email to