Github user tillrohrmann commented on the issue:
https://github.com/apache/flink/pull/2629
The combination of `UNORDERED` with event time is not strictly meaningless.
The only thing which one has to do respect is that only those elements in the
queue from the beginning to the first watermark element are allowed to be
processed in an `UNORDERED` fashion. Only they have been emitted, the watermark
can be emitted. After the watermark has been emitted, the next batch of stream
records can be processed `UNORDERED`. An example:
Given `queue = s1, s2, s3, w1, s4, w2, s5, s6, w3` we can process `s1, s2`
`s3` `UNORDERED`. After they have been emitted, `w1` has to be emitted. And
only after the emission of `w1`, we can start emitting `s4`. After `w2` we can
process `s5` and `s6` `UNORDERED`. And so on...
Concerning the exactly once processing guarantees. The problem is the
following with the current implementation. When you call
`AsyncWaitOperator.snapshotState` it will call
`AsyncCollectorBuffer.getStreamElementsInBuffer` which will block the `Emitter`
thread. However, after `getStreamElementsInBuffer` has completed, the `Emitter`
thread is again allowed to emit new elements, right? Thus, this can happen
before the `StreamTask` has actually send the checkpoint barrier to downstream
operators because it is not guarded by the checkpoint lock. So you might have
an element `x1` contained in the checkpoint of the `AsyncWaitOperator` and in
one of the downstream operators. That's the reason why you only have at-least
once processing guarantees.
I fear that the hand-tailored solution for the `AsyncWaitOperator` in the
`StreamTask.performCheckpoint` method won't solve the problems with the exactly
once processing guarantees because of the afore-mentioned problems. I don't see
another way to solve the problem at the moment other than using the checkpoint
lock. Have you measured the performance penalty for multiple chained
`AsyncWaitOperators` when using the checkpoint lock?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---