Github user chemikadze commented on a diff in the pull request:

    https://github.com/apache/incubator-griffin/pull/421#discussion_r219724832
  
    --- Diff: 
service/src/main/java/org/apache/griffin/core/util/YarnNetUtil.java ---
    @@ -56,6 +62,14 @@ public static boolean update(String url, JobInstanceBean 
instance) {
                     instance.setState(LivySessionStates.toLivyState(state));
                 }
                 return true;
    +        } catch (HttpClientErrorException e) {
    +            LOGGER.warn("client error {} from yarn: {}",
    +                    e.getMessage(), e.getResponseBodyAsString());
    +            if (e.getStatusCode() == HttpStatus.NOT_FOUND) {
    +                // in sync with Livy behavior, see 
com.cloudera.livy.utils.SparkYarnApp
    +                instance.setState(DEAD);
    --- End diff --
    
    Only 404 is handled here, which should not be result of network issue.
    
    It looks like any kind of error reported by Yarn client (after internal 
retries) results in DEADing job on Livy side: 
https://github.com/cloudera/livy/blob/master/server/src/main/scala/com/cloudera/livy/utils/SparkYarnApp.scala#L307
    I'll need to double check whether not found applications are ever getting 
retried, to make sure behavior is same as on Livy side. If not -- then that's 
what Livy would do.


---

Reply via email to