[ 
https://issues.apache.org/jira/browse/SPARK-37329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444890#comment-17444890
 ] 

Wei-Chiu Chuang commented on SPARK-37329:
-----------------------------------------

I should also note that this affects not just KMS, but any file system 
implementation (HDFS, Ozone, perhaps S3) with delegation token support.

> File system delegation tokens are leaked
> ----------------------------------------
>
>                 Key: SPARK-37329
>                 URL: https://issues.apache.org/jira/browse/SPARK-37329
>             Project: Spark
>          Issue Type: Bug
>          Components: Security, YARN
>    Affects Versions: 2.4.0
>            Reporter: Wei-Chiu Chuang
>            Priority: Major
>
> On a very busy Hadoop cluster (with HDFS at rest encryption) we found KMS 
> accumulated millions of delegation tokens that are not cancelled even after 
> jobs are finished, and KMS goes out of memory within a day because of the 
> delegation token leak.
> We were able to reproduce the bug in a smaller test cluster, and realized 
> when a Spark job starts, it acquires two delegation tokens, and only one is 
> cancelled properly after the job finishes. The other one is left over and 
> linger around for up to 7 days ( default Hadoop delegation token life time).
> YARN handles the lifecycle of a delegation token properly if its renewer is 
> 'yarn'. However, Spark intentionally (a hack?) acquires a second delegation 
> token with the job issuer as the renewer, simply to get the token renewal 
> interval. The token is then ignored but not cancelled.
> Propose: cancel the delegation token immediately after the token renewal 
> interval is obtained.
> Environment: CDH6.3.2 (based on Apache Spark 2.4.0) but the bug probably got 
> introduced since day 1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to