[
https://issues.apache.org/jira/browse/MAPREDUCE-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241325#comment-13241325
]
Robert Joseph Evans commented on MAPREDUCE-4033:
------------------------------------------------
It looks like your job, and the history server have different configuration
values for where to write/read the jhist files.
I see your oozie job create the directory
/tmp/hadoop-yarn/staging/history/done_intermediate/test
But I see the history server looking for jobs under
/home/tucu/src/cloudera/oozietucu/core/target/org.apache.hadoop.mapred.MiniMRCluster/apps_staging_dir/history/done_intermediate
Which looks like the MiniCluster overriding the value for when we don't use
HDFS.
So when the test passes it probably got the status from the AM before it
exited, and when it fails it tried to get status from the history server, but
the history server has no knowledge of your job, because the files are not
where it expects them to be. I am not super familiar with the mini cluster so
I am not super sure where to look to fix this.
> time lag between job completion and job being avail in JH server makes Oozie
> fail
> ---------------------------------------------------------------------------------
>
> Key: MAPREDUCE-4033
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4033
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 0.23.3
> Reporter: Alejandro Abdelnur
> Priority: Critical
> Fix For: 2.0.0
>
> Attachments: minicluster-oozie-pig.txt
>
>
> Oozie testcases are failing randomly because MR2 reports the job as unknown.
> This seems to happen when Oozie queries via JobClient.getJob(<JOBID>) for a
> <JOBID> that just finished.
> {code}
> org.apache.oozie.action.ActionExecutorException: JA017: Unknown hadoop job
> [job_1332176678205_0011] associated with action
> [0000000-120319101023910-oozie-tucu-W@pig-action]. Failing this action!
> {code}
> Oozie reports this error when JobClient.getJob(<JOBID>) returns NULL.
> Looking at the mini cluster logs the job definitely run.
> {code}
> find . -name "*1332176678205_0011*"
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_0/application_1332176678205_0011
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_0/application_1332176678205_0011/container_1332176678205_0011_01_000002
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_0/application_1332176678205_0011/container_1332176678205_0011_01_000001
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_2/application_1332176678205_0011
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_2/application_1332176678205_0011/container_1332176678205_0011_01_000002
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_2/application_1332176678205_0011/container_1332176678205_0011_01_000001
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1332176678205_0011
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1332176678205_0011/container_1332176678205_0011_01_000002
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1332176678205_0011/container_1332176678205_0011_01_000001
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_1/application_1332176678205_0011
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_1/application_1332176678205_0011/container_1332176678205_0011_01_000002
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_1/application_1332176678205_0011/container_1332176678205_0011_01_000001
> {code}
> It seems there is a gap until the the job is avail in the JH server.
> If this gap is unavoidable we need to ensure Oozie always waits at least the
> gap time before querying for a job.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira