I checked the logs of my tests, and found that the Spark schedules the
token refresh based on the renew-interval property, not the max-lifetime.

The settings in my tests:
dfs.namenode.delegation.key.update-interval=520000
dfs.namenode.delegation.token.max-lifetime=1020000
dfs.namenode.delegation.token.renew-interval=520000

During the job submission, spark.yarn.token.renewal.interval is set:
2016-11-04 09:12:25 INFO  Client:59 - Renewal Interval set to 520036

Then, it takes ~0.75*spark.yarn.token.renewal.interval to schedule the
token refresh.

2016-11-04 09:12:37 INFO  ExecutorDelegationTokenUpdater:59 - Scheduling
token refresh from HDFS in 404251 millis.
...
2016-11-04 09:19:21 INFO  ExecutorDelegationTokenUpdater:59 - Reading new
delegation tokens from ...
...
2016-11-04 09:19:21 INFO  ExecutorDelegationTokenUpdater:59 - Scheduling
token refresh from HDFS in 390064 millis.
...
2016-11-04 09:25:52 INFO  ExecutorDelegationTokenUpdater:59 - Reading new
delegation tokens from ...
...
2016-11-04 09:25:52 INFO  ExecutorDelegationTokenUpdater:59 - Scheduling
token refresh from HDFS in 390022 millis.

This was what confused me in the first place. Why does Spark ask for new
tokens based on the renew-interval instead of the max-lifetime?


2016-11-04 2:37 GMT+01:00 Marcelo Vanzin <van...@cloudera.com>:

> On Thu, Nov 3, 2016 at 3:47 PM, Zsolt Tóth <toth.zsolt....@gmail.com>
> wrote:
> > What is the purpose of the delegation token renewal (the one that is done
> > automatically by Hadoop libraries, after 1 day by default)? It seems
> that it
> > always happens (every day) until the token expires, no matter what. I'd
> > probably find an answer to that in a basic Hadoop security description.
>
> I'm not sure and I never really got a good answer to that (I had the
> same question in the past). My best guess is to limit how long an
> attacker can do bad things if he gets hold of a delegation token. But
> IMO if an attacker gets a delegation token, that's pretty bad
> regardless of how long he can use it...
>
> > I have a feeling that giving the keytab to Spark bypasses the concept
> behind
> > delegation tokens. As I understand, the NN basically says that "your
> > application can access hdfs with this delegation token, but only for 7
> > days".
>
> I'm not sure why there's a 7 day limit either, but let's assume
> there's a good reason. Basically the app, at that point, needs to
> prove to the NN it has a valid kerberos credential. Whether that's
> from someone typing their password into a terminal, or code using a
> keytab, it doesn't really matter. If someone was worried about that
> user being malicious they'd disable the user's login in the KDC.
>
> This feature is needed because there are apps that need to keep
> running, unattended, for longer than HDFS's max lifetime setting.
>
> --
> Marcelo
>

Reply via email to