[ 
https://issues.apache.org/jira/browse/HIVE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588367#comment-14588367
 ] 

Thejas M Nair commented on HIVE-11008:
--------------------------------------

As mentioned in description, this issue happens because of difference between 
the jobs retained by RM and job history server, and that is applicable only to 
showJobList. That problem is applicable only to showJobList() call, when 
showDetails gets set to true.
 This is not an ideal solution, but since the jobclient is not able to 
distinguish between real failures that it needs to retry on (eg transient fs 
errors) and failures due to job not existing, we don't have any good 
alternative.
For showJobId(), it is better to still retry.

If we move this to StatusDelegator.run(), we will have to pass some boolean to 
it, so that this is set only in case of showJobList() call. Please let me know 
if you think that is better.


> webhcat GET /jobs retries on getting job details from history server is too 
> agressive
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-11008
>                 URL: https://issues.apache.org/jira/browse/HIVE-11008
>             Project: Hive
>          Issue Type: Bug
>          Components: WebHCat
>    Affects Versions: 1.2.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>         Attachments: HIVE-11008.1.patch
>
>
> Webhcat "jobs" api gets the list of jobs from RM and then gets details from 
> history server.
> RM has a policy of retaining fixed number of jobs to accommodate for the 
> memory it has, while HistoryServer retains jobs based on their age. As a 
> result, jobs that RM returns might not be present in HistoryServer and can 
> result in a failure. HistoryServer also ends up retrying on failures even if 
> they happen because the job actually does not exist. 
> The retries to get details from HistoryServer in such cases is too aggressive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to