kennknowles opened a new issue, #18677:
URL: https://github.com/apache/beam/issues/18677

   Consider this sequence, with session gap durations of 5:
   
    - element arrives with timestamp 0, assigned to proto-window [0, 5)
    - watermark advances to 6, emitting the session and discarding it
    - element arrives with timestamp 3, assigned to proto-window [3, 8) so it 
is not dropped as the window is not expired
    - watermark advances to 8****, emitting that session
   
   While "technically correct" according to spec, this seems undesirable. It 
was introduced when late data dropping was tied to window expiry. I think 
either dropping the second element or including it and emitting a merged window 
would be OK.
   
   In the case of sessions, we could just retain the window until it cannot 
possibly merge with other non-expired data. Even with allowed lateness zero 
this is double the gap duration. The window would be in an interesting state 
where it would be expired and ineligible for further output but could still 
merge and the greater window could be output.
   
   The challenge is that sessions are just one kind of merging window - the 
merging logic has to be assumed opaque. So we cannot simply reason about how 
sessions work. The other, more drastic option, is to rethink how late data 
dropping is defined for merging windows, particularly in the "proto-window" 
phase.
   
   Imported from Jira 
[BEAM-3568](https://issues.apache.org/jira/browse/BEAM-3568). Original Jira may 
contain additional context.
   Reported by: kenn.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to