[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527098#comment-14527098
 ] 

Khaja Hussain commented on MAPREDUCE-6312:
------------------------------------------

To Radim: My failure had the same exception triggered. It is clear from the log 
that my issue is different from your. In this case you can ignore the comments. 
Thanks.

> Hive fails due to stale proxy in ClientServiceDelegate
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-6312
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6312
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.5.0
>            Reporter: Radim Kubacki
>
> ClientServiceDelegate initializes its realProxy field to AMProxy for a new or 
> running job. Later when the job finishes it will not update this proxy to 
> query history server and AM will not return valid data for this job.
> We found this while investigating 
> https://issues.cloudera.org/browse/DISTRO-631 that describes Hive failure 
> because it uses loop like 
> {code}
>   progress(JobClient jc, RunningJob rj) { ...
>         while (!rj.isComplete() || (extraRounds > 0)) {
>             try {
>                 Thread.sleep(1000);
>             } catch (InterruptedException e) {
>             }
>             RunningJob newRj = jc.getJob(rj.getID());
>             if (newRj == null) {
>                 // under exceptional load, hadoop may not be able to look up 
> status
>                 // of finished jobs (because it has purged them from memory). 
> From
>                 // hive's perspective - it's equivalent to the job having 
> failed.
>                 // So raise a meaningful exception
>                 throw new IOException("Could not find status of job:" + 
> rj.getID());
>             } else {
>                 rj = newRj;
>             }
>         }
> {code}
> In this snippet JobClient.getJob will try to create RunningJob instance 
> referring to job file in /user/$USER/.staging even when job is finished and 
> the file is moved to /user/history/done (or /user/history/done_intermediate). 
> Note that Hive queries can succeed if there is a timing where HDFS performs 
> actual file delete with a delay.
> We can try to write a patch if there is an agreement that this should be 
> fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to