[
https://issues.apache.org/jira/browse/GRIFFIN-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625350#comment-16625350
]
ASF GitHub Bot commented on GRIFFIN-197:
----------------------------------------
Github user chemikadze commented on a diff in the pull request:
https://github.com/apache/incubator-griffin/pull/421#discussion_r219724832
--- Diff:
service/src/main/java/org/apache/griffin/core/util/YarnNetUtil.java ---
@@ -56,6 +62,14 @@ public static boolean update(String url, JobInstanceBean
instance) {
instance.setState(LivySessionStates.toLivyState(state));
}
return true;
+ } catch (HttpClientErrorException e) {
+ LOGGER.warn("client error {} from yarn: {}",
+ e.getMessage(), e.getResponseBodyAsString());
+ if (e.getStatusCode() == HttpStatus.NOT_FOUND) {
+ // in sync with Livy behavior, see
com.cloudera.livy.utils.SparkYarnApp
+ instance.setState(DEAD);
--- End diff --
Only 404 is handled here, which should not be result of network issue.
It looks like any kind of error reported by Yarn client (after internal
retries) results in DEADing job on Livy side:
https://github.com/cloudera/livy/blob/master/server/src/main/scala/com/cloudera/livy/utils/SparkYarnApp.scala#L307
I'll need to double check whether not found applications are ever getting
retried, to make sure behavior is same as on Livy side. If not -- then that's
what Livy would do.
> Job is left in UNKNOWN state by Service if Yarn RM is restarted
> ---------------------------------------------------------------
>
> Key: GRIFFIN-197
> URL: https://issues.apache.org/jira/browse/GRIFFIN-197
> Project: Griffin (Incubating)
> Issue Type: Bug
> Reporter: Nikolay Sokolov
> Priority: Minor
>
> From one hand, according to Livy behavior, missing app can be treaded as DEAD
> state.
> From other hand, logged client errors are polluting the log with unnecessary
> stack traces, but not showing error description, returned by Yarn.
> Sample stack trace on Service side:
> {code:none}
> 2018-09-21 14:30:58.016 WARN 14699 --- [nio-8080-exec-4]
> o.a.g.c.j.JobServiceImpl : sessionId(300)
> appId(application_1534940268145_0318) 404 Not Found.
> 2018-09-21 14:30:58.016 WARN 14699 --- [nio-8080-exec-4]
> o.a.g.c.j.JobServiceImpl : Spark session 300 may be overdue!
> Now we use yarn to update state.
> 2018-09-21 14:30:58.020 ERROR 14699 --- [nio-8080-exec-4]
> o.a.g.c.u.YarnNetUtil : update exception happens by yarn.
> {}
> org.springframework.web.client.HttpClientErrorException: 404 Not Found
> at
> org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:91)
> ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
> at
> org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:700)
> ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
> at
> org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:653)
> ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
> at
> org.springframework.web.client.RestTemplate.execute(RestTemplate.java:613)
> ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
> at
> org.springframework.web.client.RestTemplate.getForObject(RestTemplate.java:287)
> ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
> at
> org.apache.griffin.core.util.YarnNetUtil.update(YarnNetUtil.java:53)
> [classes!/:0.3.1-incubating-SNAPSHOT]
> at
> org.apache.griffin.core.job.JobServiceImpl.setStateByYarn(JobServiceImpl.java:569)
> [classes!/:0.3.1-incubating-SNAPSHOT]
> at
> org.apache.griffin.core.job.JobServiceImpl.setStateByYarn(JobServiceImpl.java:530)
> [classes!/:0.3.1-incubating-SNAPSHOT]
> at
> org.apache.griffin.core.job.JobServiceImpl.syncInstancesOfJob(JobServiceImpl.java:514)
> [classes!/:0.3.1-incubating-SNAPSHOT]
> at
> org.apache.griffin.core.job.JobServiceImpl.updateState(JobServiceImpl.java:274)
> [classes!/:0.3.1-incubating-SNAPSHOT]
> at
> org.apache.griffin.core.job.JobServiceImpl.findInstancesOfJob(JobServiceImpl.java:267)
> [classes!/:0.3.1-incubating-SNAPSHOT]
> at
> org.apache.griffin.core.job.JobController.findInstancesOfJob(JobController.java:94)
> [classes!/:0.3.1-incubating-SNAPSHOT]
> at sun.reflect.GeneratedMethodAccessor124.invoke(Unknown Source)
> ~[?:?]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_181]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
> at
> org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
> [spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)