aljoscha commented on a change in pull request #14312: URL: https://github.com/apache/flink/pull/14312#discussion_r542271310
########## File path: docs/dev/datastream_execution_mode.md ########## @@ -237,6 +237,36 @@ next key. See [FLIP-140](https://cwiki.apache.org/confluence/x/kDh4CQ) for background information on this. +### Order of Processing + +The order in which records are processed in operators or user defined functions +(UDFs) can differ between `BATCH` and `STREAMING` execution. + +In `STREAMING` mode, user defined functions should not make any assumptions +about the order of incoming records. Records are processed immediately once +they arrive in sources. + +In `BATCH` execution mode, there are some operations where Flink guarantees +order. The ordering can be a side effect of the special task scheduling, +network shuffle, and state backend (see above) or it can be a conscious choice +by the system. + +There are three general types of input that we can differentiate: + +- _keyed input_: input from a `KeyedStream` +- _broadcast input_: input from a broadcast stream (see also [Broadcast + State]({% link dev/stream/state/broadcast_state.md %})) +- _regular input_: input that isn't any of the above types of input + +These are the ordering rules for the different input types + +- keyed inputs are processed after all other inputs +- broadcast inputs are processed before regular inputs + +As mentioned above, the keyed input will be grouped and Flink will process all +records of a keyed group consecutively before processing the next group. Review comment: Perfect, thanks! I knew I was asking the right person. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
