[
https://issues.apache.org/jira/browse/MAPREDUCE-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521286#comment-14521286
]
Radim Kubacki commented on MAPREDUCE-6312:
------------------------------------------
Another related bug filed against Hive -
https://issues.apache.org/jira/browse/HIVE-8339 This time there is a patch that
has workaround for this problem applied on Hive's side.
> Hive fails due to stale proxy in ClientServiceDelegate
> ------------------------------------------------------
>
> Key: MAPREDUCE-6312
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6312
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: client
> Affects Versions: 2.5.0
> Reporter: Radim Kubacki
>
> ClientServiceDelegate initializes its realProxy field to AMProxy for a new or
> running job. Later when the job finishes it will not update this proxy to
> query history server and AM will not return valid data for this job.
> We found this while investigating
> https://issues.cloudera.org/browse/DISTRO-631 that describes Hive failure
> because it uses loop like
> {code}
> progress(JobClient jc, RunningJob rj) { ...
> while (!rj.isComplete() || (extraRounds > 0)) {
> try {
> Thread.sleep(1000);
> } catch (InterruptedException e) {
> }
> RunningJob newRj = jc.getJob(rj.getID());
> if (newRj == null) {
> // under exceptional load, hadoop may not be able to look up
> status
> // of finished jobs (because it has purged them from memory).
> From
> // hive's perspective - it's equivalent to the job having
> failed.
> // So raise a meaningful exception
> throw new IOException("Could not find status of job:" +
> rj.getID());
> } else {
> rj = newRj;
> }
> }
> {code}
> In this snippet JobClient.getJob will try to create RunningJob instance
> referring to job file in /user/$USER/.staging even when job is finished and
> the file is moved to /user/history/done (or /user/history/done_intermediate).
> Note that Hive queries can succeed if there is a timing where HDFS performs
> actual file delete with a delay.
> We can try to write a patch if there is an agreement that this should be
> fixed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)