[
https://issues.apache.org/jira/browse/TEZ-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510319#comment-14510319
]
Hitesh Shah commented on TEZ-2303:
----------------------------------
The dag is being sent the recover event before all services are started. This
will start generating events ( both to the dispatcher as well as to
history/recovery, etc ). If an error occurs, the shutdownHandler is invoked.
This will hit issues as services will not have started.
This should unregister with the RM under normal circumstances. Maybe a separate
jira to handle the diagnostics in the following section:
{code}
} catch (IOException e) {
LOG.error("Error occurred when trying to recover data from previous
attempt."
+ " Shutting down AM", e);
this.state = DAGAppMasterState.ERROR;
this.taskSchedulerEventHandler.setShouldUnregisterFlag();
shutdownHandler.shutdown();
return;
}
{code}
Is there a way to only stop accepting connections from clients until after the
DAG is recovered? Not starting only that service also has problems as I believe
the YarnSchedulerService depends on it for the host:port info.
> ConcurrentModificationException while processing recovery
> ---------------------------------------------------------
>
> Key: TEZ-2303
> URL: https://issues.apache.org/jira/browse/TEZ-2303
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Jason Lowe
> Assignee: Jeff Zhang
> Attachments: TEZ-2303-1.patch, TEZ-2303-2.patch
>
>
> Saw a Tez AM log a few ConcurrentModificationException messages while trying
> to recover from a previous attempt that crashed. Exception details to follow.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)