[
https://issues.apache.org/jira/browse/SPARK-33440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jungtaek Lim resolved SPARK-33440.
----------------------------------
Fix Version/s: 3.0.2
3.1.0
Resolution: Fixed
Issue resolved by pull request 30366
[https://github.com/apache/spark/pull/30366]
> Spark schedules on updating delegation token with 0 interval under some token
> provider implementation
> -----------------------------------------------------------------------------------------------------
>
> Key: SPARK-33440
> URL: https://issues.apache.org/jira/browse/SPARK-33440
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.0.1, 3.1.0
> Reporter: Jungtaek Lim
> Assignee: Jungtaek Lim
> Priority: Major
> Fix For: 3.1.0, 3.0.2
>
>
> We got a report from customer that under specific circumstance Spark
> schedules on updating delegation token with 0 interval, ended up with
> flooding log message & massive requests on token handler side.
> After investigation, the problem was they have two delegation token
> identifiers which one of token identifier (IDBS3ATokenIdentifier) has the
> value of "issue date" to be 0, whereas another token identifier
> (DelegationTokenIdentifier) has correct value.
> Both are providing the expire time correctly via Token.renew(), and Spark
> assumes issue date is "correct", hence calculating the token expire period as
> (the result of Token.renew() - "issue date").
> {code}
> 20/10/13 06:34:19 INFO security.HadoopFSDelegationTokenProvider: Renewal
> interval is 1603175657000 for token S3ADelegationToken/IDBroker
> 20/10/13 06:34:19 INFO security.HadoopFSDelegationTokenProvider: Renewal
> interval is 86400048 for token HDFS_DELEGATION_TOKEN
> {code}
> It's safe at least here because Spark picks "minimal" value. The thing is, to
> calculate the next renewal timestamp, Spark tries to add the renewal interval
> with issue date for every token, and pick minimum value, hence "86400048" is
> picked as the next renewal timestamp.
> This is "earlier" than now, hence interval to schedule goes to be negative
> (as we apply subtract with now), and Spark applies safeguard to pick the
> greater between 0 and interval, hence 0 is picked up, and schedule updating
> token infinitely. (Schedule is one-time, but the calculation will always lead
> to the negative, so that's effectively immediate schedule.)
> We should construct the better consideration of "safe guard", instead of just
> guarding the schedule interval doesn't go to negative.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]