Per Key Grained Watermark Support

廖嘉逸 Sun, 22 Sep 2019 21:16:45 -0700

Hi all,

Currently Watermark can only be supported on task’s level(or partition level), 
which means that the data belonging to the faster key has to share the same 
watermark with the data belonging to the slower key in the same key group of a 
KeyedStream. This will lead to two problems:





1. Latency. For example, every key has its own window state but they have to 
trigger it after the window’s end time is exceeded by the watermark which is 
determined by the data belonging to the slowest key usually. (Same in 
CepOperator and other operators which are using watermark to fire result)

2. States Size. Because the faster key delayes its firing on result, it has to 
store more redundant states which should be pruned earlier.




However, since the watermark has been introduced for a long time and not been 
designed to be more fine-grained in the first place, I find that it’s very hard 
to solve this problem without a big change. I wonder if there is anyone in 
community having some successful experience on this or maybe there is a 
shortcut way? If not, I can try to draft a design if this is needed in 
community.







Best Regards,

Jiayi Liao

Per Key Grained Watermark Support

Reply via email to