[ 
https://issues.apache.org/jira/browse/HADOOP-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640135#action_12640135
 ] 

Vinod K V commented on HADOOP-4296:
-----------------------------------

-1. I think that in this patch itself you should also make JobClient skip out 
of the loop when it detects the job as complete, because that is the correct 
fix irrespective of what the polling interval of JobClient is. Currently note 
that we are not sync'ing client's polling interval with MIN_TIME_BEFORE_RETIRE 
that you added, so if later, client's polling interval becomes 
configurable/increases beyond MIN_TIME_BEFORE_RETIRE, this problem surfaces 
again.

> Spasm of JobClient failures on successful jobs every once in a while
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4296
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4296
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.1
>            Reporter: Joydeep Sen Sarma
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4296_jt_delayretire.patch, 4296_jt_delayretire2.patch
>
>
> At very busy times - we get a wave of job client failures all at the same 
> time. the failures come when the job is about to complete. when we look at 
> the job history files - the jobs are actually complete. Here's the stack:
> 08/09/27 02:18:00 INFO mapred.JobClient:  map 100% reduce 98%
> 08/09/27 02:18:41 INFO mapred.JobClient:  map 100% reduce 99% 
> java.lang.NullPointerException
>       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:993)
>       at 
> com.facebook.hive.common.columnSetLoader.main(columnSetLoader.java:535)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to