[
https://issues.apache.org/jira/browse/HIVE-17941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
BELUGA BEHR updated HIVE-17941:
-------------------------------
Summary: Don't Re-Create RunningJob Client During Status Checks (was:
Don't Re-Create Running Job Client During Status Checks)
> Don't Re-Create RunningJob Client During Status Checks
> ------------------------------------------------------
>
> Key: HIVE-17941
> URL: https://issues.apache.org/jira/browse/HIVE-17941
> Project: Hive
> Issue Type: Improvement
> Components: HiveServer2
> Affects Versions: 3.0.0, 2.3.1
> Reporter: BELUGA BEHR
>
> {code:java|title=org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper}
> while (!rj.isComplete()) {
> ...
> RunningJob newRj = jc.getJob(rj.getID());
> if (newRj == null) {
> // under exceptional load, hadoop may not be able to look up status
> // of finished jobs (because it has purged them from memory). From
> // hive's perspective - it's equivalent to the job having failed.
> // So raise a meaningful exception
> throw new IOException("Could not find status of job:" + rj.getID());
> } else {
> th.setRunningJob(newRj);
> rj = newRj;
> }
> }
> ...
> }
> {code}
> https://github.com/apache/hive/blob/a9f25c0e7ad3f81a9f00f601947a161516e33f1b/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java#L295-L306
> Every time we loop here for a status update, we are rebuilding the RunningJob
> object to test if the Job information is still loaded in YARN. Rebuilding
> this RunningJob object is not trivial because it requires that we re-load and
> parse the Job Configuration XML file every time.
> {code:java|title=Outdated Stacktrace But Same Idea Holds}
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.<init>(FileInputStream.java:120)
> at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1924)
> at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1877)
> at
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1785)
> at org.apache.hadoop.conf.Configuration.get(Configuration.java:712)
> at
> org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1951)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:398)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:388)
> at
> org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:174)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:655)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:668)
> at
> org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:282)
> at
> org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:532)
> {code}
> Maybe we can be use {{isRetired()}} instead for this particular check. We
> also probably need to be better about checking the return value from any of
> the {{RunningJob}} methods if it's the case that they can fail/go-away at any
> time if YARN purges the information. It seems that perhaps this was an
> attempt to detect a purged job before exercising the {{RunningJob}} object...
> even though it can go bad at any point.
> https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapred/RunningJob.html
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)