Github user tillrohrmann commented on the issue:

    https://github.com/apache/flink/pull/2629
  
    The combination of `UNORDERED` with event time is not strictly meaningless. 
The only thing which one has to do respect is that only those elements in the 
queue from the beginning to the first watermark element are allowed to be 
processed in an `UNORDERED` fashion. Only they have been emitted, the watermark 
can be emitted. After the watermark has been emitted, the next batch of stream 
records can be processed `UNORDERED`. An example:
    
    Given `queue = s1, s2, s3, w1, s4, w2, s5, s6, w3` we can process `s1, s2` 
`s3` `UNORDERED`. After they have been emitted, `w1` has to be emitted. And 
only after the emission of `w1`, we can start emitting `s4`. After `w2` we can 
process `s5` and `s6` `UNORDERED`. And so on...
    
    Concerning the exactly once processing guarantees. The problem is the 
following with the current implementation. When you call 
`AsyncWaitOperator.snapshotState` it will call 
`AsyncCollectorBuffer.getStreamElementsInBuffer` which will block the `Emitter` 
thread. However, after `getStreamElementsInBuffer` has completed, the `Emitter` 
thread is again allowed to emit new elements, right? Thus, this can happen 
before the `StreamTask` has actually send the checkpoint barrier to downstream 
operators because it is not guarded by the checkpoint lock. So you might have 
an element `x1` contained in the checkpoint of the `AsyncWaitOperator` and in 
one of the downstream operators. That's the reason why you only have at-least 
once processing guarantees.
    
    I fear that the hand-tailored solution for the `AsyncWaitOperator` in the 
`StreamTask.performCheckpoint` method won't solve the problems with the exactly 
once processing guarantees because of the afore-mentioned problems. I don't see 
another way to solve the problem at the moment other than using the checkpoint 
lock. Have you measured the performance penalty for multiple chained 
`AsyncWaitOperators` when using the checkpoint lock?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to