Github user tweise commented on a diff in the pull request:

    https://github.com/apache/incubator-apex-core/pull/185#discussion_r49414079
  
    --- Diff: 
engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java ---
    @@ -1917,25 +1930,30 @@ public void updateRecoveryCheckpoints(PTOperator 
operator, UpdateCheckpointsCont
           long currentWindowId = WindowGenerator.getWindowId(ctx.currentTms, 
this.vars.windowStartMillis, 
this.getLogicalPlan().getValue(LogicalPlan.STREAMING_WINDOW_SIZE_MILLIS));
           maxCheckpoint = currentWindowId;
         }
    +    ctx.visited.add(operator);
     
         // DFS downstream operators
    -    for (PTOperator.PTOutput out : operator.getOutputs()) {
    -      for (PTOperator.PTInput sink : out.sinks) {
    -        PTOperator sinkOperator = sink.target;
    -        if (!ctx.visited.contains(sinkOperator)) {
    -          // downstream traversal
    -          updateRecoveryCheckpoints(sinkOperator, ctx);
    -        }
    -        // recovery window id cannot move backwards
    -        // when dynamically adding new operators
    -        if (sinkOperator.getRecoveryCheckpoint().windowId >= 
operator.getRecoveryCheckpoint().windowId) {
    -          maxCheckpoint = Math.min(maxCheckpoint, 
sinkOperator.getRecoveryCheckpoint().windowId);
    -        }
    +    if (operator.getOperatorMeta().getOperator() instanceof 
Operator.DelayOperator) {
    +      addVisited(operator, ctx);
    +    } else {
    --- End diff --
    
    It's not working because the recovery checkpoint of the operator where the 
delay loop joins can be older than the downstream operators. Therefore, when 
traversing the loop, upstream checkpoints needs to be taken into consideration, 
which is part of the broader solution Pramod refers to. Looking into this 
further, would like to clean up the special case handling for delay operator 
also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to