Nikolay Sokolov created GRIFFIN-197:
---------------------------------------

             Summary: Job is left in UNKNOWN state by Service if Yarn RM is 
restarted
                 Key: GRIFFIN-197
                 URL: https://issues.apache.org/jira/browse/GRIFFIN-197
             Project: Griffin (Incubating)
          Issue Type: Bug
            Reporter: Nikolay Sokolov


>From one hand, according to Livy behavior, missing app can be treaded as DEAD 
>state.
>From other hand, logged client errors are polluting the log with unnecessary 
>stack traces, but not showing error description, returned by Yarn.

Sample stack trace on Service side:
{code:none}
2018-09-21 14:30:58.016  WARN 14699 --- [nio-8080-exec-4] 
o.a.g.c.j.JobServiceImpl                 : sessionId(300) 
appId(application_1534940268145_0318) 404 Not Found.
2018-09-21 14:30:58.016  WARN 14699 --- [nio-8080-exec-4] 
o.a.g.c.j.JobServiceImpl                 : Spark session 300 may be overdue! 
Now we use yarn to update state.
2018-09-21 14:30:58.020 ERROR 14699 --- [nio-8080-exec-4] o.a.g.c.u.YarnNetUtil 
                   : update exception happens by yarn. {}

org.springframework.web.client.HttpClientErrorException: 404 Not Found
        at 
org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:91)
 ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
        at 
org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:700)
 ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
        at 
org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:653) 
~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
        at 
org.springframework.web.client.RestTemplate.execute(RestTemplate.java:613) 
~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
        at 
org.springframework.web.client.RestTemplate.getForObject(RestTemplate.java:287) 
~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
        at org.apache.griffin.core.util.YarnNetUtil.update(YarnNetUtil.java:53) 
[classes!/:0.3.1-incubating-SNAPSHOT]
        at 
org.apache.griffin.core.job.JobServiceImpl.setStateByYarn(JobServiceImpl.java:569)
 [classes!/:0.3.1-incubating-SNAPSHOT]
        at 
org.apache.griffin.core.job.JobServiceImpl.setStateByYarn(JobServiceImpl.java:530)
 [classes!/:0.3.1-incubating-SNAPSHOT]
        at 
org.apache.griffin.core.job.JobServiceImpl.syncInstancesOfJob(JobServiceImpl.java:514)
 [classes!/:0.3.1-incubating-SNAPSHOT]
        at 
org.apache.griffin.core.job.JobServiceImpl.updateState(JobServiceImpl.java:274) 
[classes!/:0.3.1-incubating-SNAPSHOT]
        at 
org.apache.griffin.core.job.JobServiceImpl.findInstancesOfJob(JobServiceImpl.java:267)
 [classes!/:0.3.1-incubating-SNAPSHOT]
        at 
org.apache.griffin.core.job.JobController.findInstancesOfJob(JobController.java:94)
 [classes!/:0.3.1-incubating-SNAPSHOT]
        at sun.reflect.GeneratedMethodAccessor124.invoke(Unknown Source) ~[?:?]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_181]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
        at 
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
 [spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to