Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/2629 The combination of `UNORDERED` with event time is not strictly meaningless. The only thing which one has to do respect is that only those elements in the queue from the beginning to the first watermark element are allowed to be processed in an `UNORDERED` fashion. Only they have been emitted, the watermark can be emitted. After the watermark has been emitted, the next batch of stream records can be processed `UNORDERED`. An example: Given `queue = s1, s2, s3, w1, s4, w2, s5, s6, w3` we can process `s1, s2` `s3` `UNORDERED`. After they have been emitted, `w1` has to be emitted. And only after the emission of `w1`, we can start emitting `s4`. After `w2` we can process `s5` and `s6` `UNORDERED`. And so on... Concerning the exactly once processing guarantees. The problem is the following with the current implementation. When you call `AsyncWaitOperator.snapshotState` it will call `AsyncCollectorBuffer.getStreamElementsInBuffer` which will block the `Emitter` thread. However, after `getStreamElementsInBuffer` has completed, the `Emitter` thread is again allowed to emit new elements, right? Thus, this can happen before the `StreamTask` has actually send the checkpoint barrier to downstream operators because it is not guarded by the checkpoint lock. So you might have an element `x1` contained in the checkpoint of the `AsyncWaitOperator` and in one of the downstream operators. That's the reason why you only have at-least once processing guarantees. I fear that the hand-tailored solution for the `AsyncWaitOperator` in the `StreamTask.performCheckpoint` method won't solve the problems with the exactly once processing guarantees because of the afore-mentioned problems. I don't see another way to solve the problem at the moment other than using the checkpoint lock. Have you measured the performance penalty for multiple chained `AsyncWaitOperators` when using the checkpoint lock?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---