[
https://issues.apache.org/jira/browse/TEZ-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14976360#comment-14976360
]
Bikas Saha commented on TEZ-2581:
---------------------------------
tez-dag/src/main/java/org/apache/tez/dag/app/RecoveryParser.java
{code} public void isRecoverable() {
// DAG is not recoverable if vertex has committed and has completed the
commit
// but its full recovery events are not seen.
for (Map.Entry<TezVertexID, VertexRecoveryData> entry :
vertexRecoveryData.entrySet()) {
TezVertexID vertexId = entry.getKey();
VertexRecoveryData vertexData = entry.getValue();
if (vertexData.getVertexFinishedEvent() != null) {
// TODO only see full task events should be OK
if (vertexCommitStatus.containsKey(vertexId) &&
vertexCommitStatus.get(vertexId)) {
this.nonRecoverable = true;
this.reason = "Vertex has been committed, but its full recovery
events are not seen, vertexId=" + vertexId;
return;
}
{code}
Didn't quite understand this. What is the reasoning behind this? Also does
"vertexData.getVertexFinishedEvent() != null" imply that full recovery events
have not been seen? Should it be null (based on the code in isVertexFinished())?
Can we put the ".containsKey+.get" check in a method similar to
isVertexGroupCommitted()?
Why are we separately looking at vertex group commit members? vertex group
commit is a single operation that commits for all member vertices. Each member
vertex does not have a separate commit operation.
{code}
LOG.info("Read HistoryEvent, eventType=" + historyEvent.getEventType() + ",
event=" + historyEvent);
LOG.info("attemptRecoveryPath:" + attemptPath);
LOG.info("summaryPath:" + summaryFile);
LOG.info("SummaryFile size:" + summaryFiles.size()); and other new logs
{code}
Log debug or remove?
{code} new Path(currentAttemptRecoveryDataDir,
appId.toString().replace(
"application", "dag")
+ "_1" + TezConstants.DAG_RECOVERY_RECOVER_FILE_SUFFIX);
{code} Why is it always _1 ?
typo - vertexGroupMemeberCommitStatus,VERTEX_INIT_GENREATED_EVENTS
How is DagRecoveryData.isRecoverable() different from isDAGRecoverable()?
Put into a common method? Same for task and attempt.
{code} VertexRecoveryData vertexRecoveryData =
recoveredDAGData.vertexRecoveryData.get(initGeneratedEvent.getVertexID());
if (vertexRecoveryData == null) {
vertexRecoveryData = new VertexRecoveryData();
recoveredDAGData.vertexRecoveryData.put(initGeneratedEvent.getVertexID(),
vertexRecoveryData);
}{code}
taGeneratedEvents should be in TaskAttemptRecoveryData, right? There could be
multiple attempts for a task and each could generate events.
> Umbrella for Tez Recovery Redesign
> ----------------------------------
>
> Key: TEZ-2581
> URL: https://issues.apache.org/jira/browse/TEZ-2581
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: TEZ-2581-WIP-1.patch, TEZ-2581-WIP-2.patch,
> TEZ-2581-WIP-3.patch, TEZ-2581-WIP-4.patch, TEZ-2581-WIP-5.patch,
> TEZ-2581-WIP-6.patch, TezRecoveryRedesignProposal.pdf,
> TezRecoveryRedesignV1.1.pdf
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)