[ 
https://issues.apache.org/jira/browse/GRIFFIN-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624678#comment-16624678
 ] 

ASF GitHub Bot commented on GRIFFIN-197:
----------------------------------------

Github user guoyuepeng commented on a diff in the pull request:

    https://github.com/apache/incubator-griffin/pull/421#discussion_r219672109
  
    --- Diff: 
service/src/main/java/org/apache/griffin/core/util/YarnNetUtil.java ---
    @@ -56,6 +62,14 @@ public static boolean update(String url, JobInstanceBean 
instance) {
                     instance.setState(LivySessionStates.toLivyState(state));
                 }
                 return true;
    +        } catch (HttpClientErrorException e) {
    +            LOGGER.warn("client error {} from yarn: {}",
    +                    e.getMessage(), e.getResponseBodyAsString());
    +            if (e.getStatusCode() == HttpStatus.NOT_FOUND) {
    +                // in sync with Livy behavior, see 
com.cloudera.livy.utils.SparkYarnApp
    +                instance.setState(DEAD);
    --- End diff --
    
    Agree we need to handle state, 
    but what if this is caused by network issue, 
    should we double confirm before we jump to conclusion that the instance is 
dead?


> Job is left in UNKNOWN state by Service if Yarn RM is restarted
> ---------------------------------------------------------------
>
>                 Key: GRIFFIN-197
>                 URL: https://issues.apache.org/jira/browse/GRIFFIN-197
>             Project: Griffin (Incubating)
>          Issue Type: Bug
>            Reporter: Nikolay Sokolov
>            Priority: Minor
>
> From one hand, according to Livy behavior, missing app can be treaded as DEAD 
> state.
> From other hand, logged client errors are polluting the log with unnecessary 
> stack traces, but not showing error description, returned by Yarn.
> Sample stack trace on Service side:
> {code:none}
> 2018-09-21 14:30:58.016  WARN 14699 --- [nio-8080-exec-4] 
> o.a.g.c.j.JobServiceImpl                 : sessionId(300) 
> appId(application_1534940268145_0318) 404 Not Found.
> 2018-09-21 14:30:58.016  WARN 14699 --- [nio-8080-exec-4] 
> o.a.g.c.j.JobServiceImpl                 : Spark session 300 may be overdue! 
> Now we use yarn to update state.
> 2018-09-21 14:30:58.020 ERROR 14699 --- [nio-8080-exec-4] 
> o.a.g.c.u.YarnNetUtil                    : update exception happens by yarn. 
> {}
> org.springframework.web.client.HttpClientErrorException: 404 Not Found
>         at 
> org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:91)
>  ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
>         at 
> org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:700)
>  ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
>         at 
> org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:653) 
> ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
>         at 
> org.springframework.web.client.RestTemplate.execute(RestTemplate.java:613) 
> ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
>         at 
> org.springframework.web.client.RestTemplate.getForObject(RestTemplate.java:287)
>  ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
>         at 
> org.apache.griffin.core.util.YarnNetUtil.update(YarnNetUtil.java:53) 
> [classes!/:0.3.1-incubating-SNAPSHOT]
>         at 
> org.apache.griffin.core.job.JobServiceImpl.setStateByYarn(JobServiceImpl.java:569)
>  [classes!/:0.3.1-incubating-SNAPSHOT]
>         at 
> org.apache.griffin.core.job.JobServiceImpl.setStateByYarn(JobServiceImpl.java:530)
>  [classes!/:0.3.1-incubating-SNAPSHOT]
>         at 
> org.apache.griffin.core.job.JobServiceImpl.syncInstancesOfJob(JobServiceImpl.java:514)
>  [classes!/:0.3.1-incubating-SNAPSHOT]
>         at 
> org.apache.griffin.core.job.JobServiceImpl.updateState(JobServiceImpl.java:274)
>  [classes!/:0.3.1-incubating-SNAPSHOT]
>         at 
> org.apache.griffin.core.job.JobServiceImpl.findInstancesOfJob(JobServiceImpl.java:267)
>  [classes!/:0.3.1-incubating-SNAPSHOT]
>         at 
> org.apache.griffin.core.job.JobController.findInstancesOfJob(JobController.java:94)
>  [classes!/:0.3.1-incubating-SNAPSHOT]
>         at sun.reflect.GeneratedMethodAccessor124.invoke(Unknown Source) 
> ~[?:?]
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_181]
>         at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
>         at 
> org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
>  [spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to