I checked the logs of my tests, and found that the Spark schedules the token refresh based on the renew-interval property, not the max-lifetime.
The settings in my tests: dfs.namenode.delegation.key.update-interval=520000 dfs.namenode.delegation.token.max-lifetime=1020000 dfs.namenode.delegation.token.renew-interval=520000 During the job submission, spark.yarn.token.renewal.interval is set: 2016-11-04 09:12:25 INFO Client:59 - Renewal Interval set to 520036 Then, it takes ~0.75*spark.yarn.token.renewal.interval to schedule the token refresh. 2016-11-04 09:12:37 INFO ExecutorDelegationTokenUpdater:59 - Scheduling token refresh from HDFS in 404251 millis. ... 2016-11-04 09:19:21 INFO ExecutorDelegationTokenUpdater:59 - Reading new delegation tokens from ... ... 2016-11-04 09:19:21 INFO ExecutorDelegationTokenUpdater:59 - Scheduling token refresh from HDFS in 390064 millis. ... 2016-11-04 09:25:52 INFO ExecutorDelegationTokenUpdater:59 - Reading new delegation tokens from ... ... 2016-11-04 09:25:52 INFO ExecutorDelegationTokenUpdater:59 - Scheduling token refresh from HDFS in 390022 millis. This was what confused me in the first place. Why does Spark ask for new tokens based on the renew-interval instead of the max-lifetime? 2016-11-04 2:37 GMT+01:00 Marcelo Vanzin <van...@cloudera.com>: > On Thu, Nov 3, 2016 at 3:47 PM, Zsolt Tóth <toth.zsolt....@gmail.com> > wrote: > > What is the purpose of the delegation token renewal (the one that is done > > automatically by Hadoop libraries, after 1 day by default)? It seems > that it > > always happens (every day) until the token expires, no matter what. I'd > > probably find an answer to that in a basic Hadoop security description. > > I'm not sure and I never really got a good answer to that (I had the > same question in the past). My best guess is to limit how long an > attacker can do bad things if he gets hold of a delegation token. But > IMO if an attacker gets a delegation token, that's pretty bad > regardless of how long he can use it... > > > I have a feeling that giving the keytab to Spark bypasses the concept > behind > > delegation tokens. As I understand, the NN basically says that "your > > application can access hdfs with this delegation token, but only for 7 > > days". > > I'm not sure why there's a 7 day limit either, but let's assume > there's a good reason. Basically the app, at that point, needs to > prove to the NN it has a valid kerberos credential. Whether that's > from someone typing their password into a terminal, or code using a > keytab, it doesn't really matter. If someone was worried about that > user being malicious they'd disable the user's login in the KDC. > > This feature is needed because there are apps that need to keep > running, unattended, for longer than HDFS's max lifetime setting. > > -- > Marcelo >