[
https://issues.apache.org/jira/browse/OOZIE-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905610#comment-15905610
]
Satish Subhashrao Saley commented on OOZIE-2807:
------------------------------------------------
Liked Robert's suggestions. addRMDelegationToken method will be only place for
adding tokens. Would be easier to track duplicate additions. Updated the patch.
> Oozie gets RM delegation token even for checking job status
> -----------------------------------------------------------
>
> Key: OOZIE-2807
> URL: https://issues.apache.org/jira/browse/OOZIE-2807
> Project: Oozie
> Issue Type: Bug
> Reporter: Rohini Palaniswamy
> Assignee: Satish Subhashrao Saley
> Fix For: 5.0.0
>
> Attachments: OOZIE-2807-1.patch, OOZIE-2807-2.patch,
> OOZIE-2807-3.patch, OOZIE-2807-4.patch
>
>
> We had one user submitting way too many workflows with single hive query -
> ~3600 workflows running concurrently. Surprisingly Oozie held up well without
> issues.
> But [~daryn] from our hadoop team saw that the amount of delegation tokens
> fetched by Oozie was very high compared to actual number of jobs submitted
> and was stressing RM with the calls and also pushing it close to its memory
> limits. This is because we are fetching the delegation token every time we
> create a JobClient instead of only during job submission.
> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/service/HadoopAccessorService.java#L503-L519
> So for one job we fetch
> 1) 1 token during submission
> 2) 1 token every 5 minutes when we check status of job
> 3) 1 token after the job ends to retrieve status.
> 4) 1 token if we are killing the job.
> So for a job running for 11 minutes, we would have fetched the token 4 times.
> May be more in other cases like mapreduce where we check for end of launcher
> and child job.
> Only 1 out of the token (used in the job submission) will be cancelled after
> job completes. Other tokens are kind of leaked and will only be cleaned up by
> RM after the expiry period (24 hrs is default). This can make RM go out of
> memory.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)