My simplified scenario is that I want to count how many A and B events I am
receiving each minute. I also need to support late data.
Let's assume I am receiving 2 A and 2 B events in the same window
(00:00:00-00:00:001)
So the expected result is two A events , and two B events
Then I receive 2 late B events for the same (00:00:00-00:00:001) window
In the final pane I expect two A events and four B events.
But actually I am getting some accumulated results from previous panes
Below I outlined the actual and expected result.
/**
*
* Panes created by job
* ./00:00-00:01-pane-0-on_time-first
KV{A, 2}
KV{B, 2}
./00:00-00:01-pane-1-late
KV{A, 2}
KV{B, 2}
KV{B, 3}
./00:00-00:01-pane-2-late
KV{A, 2}
KV{B, 2}
KV{B, 3}
KV{B, 4}
*
*
*/
/**
*
* The expected result
* ./00:00-00:01-pane-0-on_time-first
KV{A, 2}
KV{B, 2}
./00:00-00:01-pane-1-late
KV{A, 2}
KV{B, 3}
./00:00-00:01-pane-2-late
KV{A, 2}
KV{B, 4}
*
*
*/
How do I need to change the job to get the expected result?
You can found the test code here https://gist.github.com/pbartoszek/
795452a5015e60dcf8c2112ff941250e
I am using Beam 2.0.0 and TestRunner with TestStream
Thanks,
Pawel