abstractdog commented on code in PR #266:
URL: https://github.com/apache/tez/pull/266#discussion_r1113971444
##########
tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java:
##########
@@ -1919,6 +1919,13 @@ private DAGRecoveryData recoverDAG() throws IOException,
TezException {
RecoveryParser recoveryParser = new RecoveryParser(
this, recoveryFS, recoveryDataDir, appAttemptID.getAttemptId());
DAGRecoveryData recoveredDAGData =
recoveryParser.parseRecoveryData();
+
+ if(Objects.isNull(recoveredDAGData) && amConf.getBoolean(
+ TezConfiguration.TEZ_AM_FAILURE_ON_MISSING_RECOVERY_DATA,
+
TezConfiguration.TEZ_AM_FAILURE_ON_MISSING_RECOVERY_DATA_DEFAULT)) {
+ throw new IOException("Found nothing to recover in
currentAttemptId= "
Review Comment:
a couple of comments here:
1. I think it would make sense to extract the whole parse logic to another
method (from the instantiation of the recoveryParser) to prevent recoverDAG
grow further
2. regarding the actual fix: Objects.isNull tells almost nothing about the
scenario that has been discussed in jira, so you should describe it in a code
comment here, touching the most important points (AM shutdown prematurely, so
as the recovery service, recovery stream is not closed, there is no valid
recovery data, recovery parser returns null instead of failing, etc.), also,
describe the assumption here: callers of this method should catch this
exception and make the AM finish with state of DAGAppMasterState.ERROR (this is
currently true, but after a refactor, it can simply collide, as we don't do the
actual AM failure logic here)
3. what about having the recoveryDataDir path in the exception message to
help the log reader investigate further?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]