We recently started a discussion on the Flink ML about this topic: [1] The gist of it is that for some use cases tracking the watermark per-key instead of globally (or rather per partition) can be useful for some cases. Think, for example, of tracking some user data off mobile phones where the user-id/phone number is the key. For these cases, one slow key, i.e. a disconnected phone, would slow down the watermark.
I'm not saying that this is a good idea, I'm actually skeptical because the implementation seems quite complicated/costly to me and I have some doubts about being able to track the watermark per-key in the sources. It's just that more people seem to be asking about this lately and I would like to have a discussion with the clever Beam people because we claim to be at the forefront of parallel data processing APIs. :-) Also note that I'm aware that in the example given above it would be difficult to find out if one key is slow in the first place and therefore hold the watermark for that key. If you're interested, please have a look at the discussion on the Flink ML that I linked to above. It's only 4 mails so far and Jamie gives a nicer explanation of the possible use cases than I'm doing here. Note, that my discussion of feasibility and APIs/key lineage should also apply to Beam. What do you think? [1] https://lists.apache.org/thread.html/2b90d5b1d5e2654212cfbbcc6510ef424bbafc4fadb164bd5aff9216@%3Cdev.flink.apache.org%3E