[jira] [Created] (MAPREDUCE-6312) Hive fails due to stale proxy in ClientServiceDelegate

Radim Kubacki (JIRA) Tue, 07 Apr 2015 07:02:58 -0700

Radim Kubacki created MAPREDUCE-6312:
----------------------------------------


             Summary: Hive fails due to stale proxy in ClientServiceDelegate
                 Key: MAPREDUCE-6312
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6312
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: client
    Affects Versions: 2.5.0
            Reporter: Radim Kubacki


ClientServiceDelegate initializes its realProxy field to AMProxy for a new or 
running job. Later when the job finishes it will not update this proxy to query 
history server and AM will not return valid data for this job.

We found this while investigating https://issues.cloudera.org/browse/DISTRO-631 
that describes Hive failure because it uses loop like 
{code}
  progress(JobClient jc, RunningJob rj) { ...
        while (!rj.isComplete() || (extraRounds > 0)) {
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
            }

            RunningJob newRj = jc.getJob(rj.getID());
            if (newRj == null) {
                // under exceptional load, hadoop may not be able to look up 
status
                // of finished jobs (because it has purged them from memory). 
From
                // hive's perspective - it's equivalent to the job having 
failed.
                // So raise a meaningful exception
                throw new IOException("Could not find status of job:" + 
rj.getID());
            } else {
                rj = newRj;
            }
        }
{code}
In this snippet JobClient.getJob will try to create RunningJob instance 
referring to job file in /user/$USER/.staging even when job is finished and the 
file is moved to /user/history/done (or /user/history/done_intermediate). 

Note that Hive queries can succeed if there is a timing where HDFS performs 
actual file delete with a delay.

We can try to write a patch if there is an agreement that this should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6312) Hive fails due to stale proxy in ClientServiceDelegate

Reply via email to