Github user tweise commented on a diff in the pull request:
https://github.com/apache/incubator-apex-core/pull/185#discussion_r49414079
--- Diff:
engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java ---
@@ -1917,25 +1930,30 @@ public void updateRecoveryCheckpoints(PTOperator
operator, UpdateCheckpointsCont
long currentWindowId = WindowGenerator.getWindowId(ctx.currentTms,
this.vars.windowStartMillis,
this.getLogicalPlan().getValue(LogicalPlan.STREAMING_WINDOW_SIZE_MILLIS));
maxCheckpoint = currentWindowId;
}
+ ctx.visited.add(operator);
// DFS downstream operators
- for (PTOperator.PTOutput out : operator.getOutputs()) {
- for (PTOperator.PTInput sink : out.sinks) {
- PTOperator sinkOperator = sink.target;
- if (!ctx.visited.contains(sinkOperator)) {
- // downstream traversal
- updateRecoveryCheckpoints(sinkOperator, ctx);
- }
- // recovery window id cannot move backwards
- // when dynamically adding new operators
- if (sinkOperator.getRecoveryCheckpoint().windowId >=
operator.getRecoveryCheckpoint().windowId) {
- maxCheckpoint = Math.min(maxCheckpoint,
sinkOperator.getRecoveryCheckpoint().windowId);
- }
+ if (operator.getOperatorMeta().getOperator() instanceof
Operator.DelayOperator) {
+ addVisited(operator, ctx);
+ } else {
--- End diff --
It's not working because the recovery checkpoint of the operator where the
delay loop joins can be older than the downstream operators. Therefore, when
traversing the loop, upstream checkpoints needs to be taken into consideration,
which is part of the broader solution Pramod refers to. Looking into this
further, would like to clean up the special case handling for delay operator
also.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---