Github user guoyuepeng commented on a diff in the pull request:
https://github.com/apache/incubator-griffin/pull/421#discussion_r219672109
--- Diff:
service/src/main/java/org/apache/griffin/core/util/YarnNetUtil.java ---
@@ -56,6 +62,14 @@ public static boolean update(String url, JobInstanceBean
instance) {
instance.setState(LivySessionStates.toLivyState(state));
}
return true;
+ } catch (HttpClientErrorException e) {
+ LOGGER.warn("client error {} from yarn: {}",
+ e.getMessage(), e.getResponseBodyAsString());
+ if (e.getStatusCode() == HttpStatus.NOT_FOUND) {
+ // in sync with Livy behavior, see
com.cloudera.livy.utils.SparkYarnApp
+ instance.setState(DEAD);
--- End diff --
Agree we need to handle state,
but what if this is caused by network issue,
should we double confirm before we jump to conclusion that the instance is
dead?
---