Github user davidyan74 commented on a diff in the pull request:
https://github.com/apache/incubator-apex-core/pull/185#discussion_r49259156
--- Diff:
engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java ---
@@ -1917,25 +1930,30 @@ public void updateRecoveryCheckpoints(PTOperator
operator, UpdateCheckpointsCont
long currentWindowId = WindowGenerator.getWindowId(ctx.currentTms,
this.vars.windowStartMillis,
this.getLogicalPlan().getValue(LogicalPlan.STREAMING_WINDOW_SIZE_MILLIS));
maxCheckpoint = currentWindowId;
}
+ ctx.visited.add(operator);
// DFS downstream operators
- for (PTOperator.PTOutput out : operator.getOutputs()) {
- for (PTOperator.PTInput sink : out.sinks) {
- PTOperator sinkOperator = sink.target;
- if (!ctx.visited.contains(sinkOperator)) {
- // downstream traversal
- updateRecoveryCheckpoints(sinkOperator, ctx);
- }
- // recovery window id cannot move backwards
- // when dynamically adding new operators
- if (sinkOperator.getRecoveryCheckpoint().windowId >=
operator.getRecoveryCheckpoint().windowId) {
- maxCheckpoint = Math.min(maxCheckpoint,
sinkOperator.getRecoveryCheckpoint().windowId);
- }
+ if (operator.getOperatorMeta().getOperator() instanceof
Operator.DelayOperator) {
+ addVisited(operator, ctx);
+ } else {
--- End diff --
@tweise I think I'm not doing this correctly and hence the out-of-sequence
tuple in the unit test. My debugging indicates that the recovery checkpoints
are not in sync for the operators that try to recover. Can you please review
this and see what I'm doing wrong?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---