[GitHub] [beam] kennknowles commented on issue #23379: [Feature Request]: Expose TimerStateInternals.currentOutputWatermarkTime to allow for DoFns to handle elements behind the watemark differently

GitBox Fri, 30 Sep 2022 09:26:31 -0700


kennknowles commented on issue #23379:
URL: https://github.com/apache/beam/issues/23379#issuecomment-1263777770


   OK this is all fine. So a common case will be that a large number of windows 
are processed simultaneously, and then the watermark makes a large jump (in 
batch from -inf to +inf, but it can be any large jump) causing them all to be 
completed. My main point is that we need to avoid mental models that think of 
them being processed one after the other. There is no "next" window, and the 
idea of there being a "latest open window" is not very useful. I think that 
your points are all fine, but the goal here is still not obviously doable.
   
   The user basically wants fixed windows that are processed "one after the 
other" and for elements to be assigned to the "current" window with just a flag 
that indicates whether they really belong to that window or whether they 
"should have" been put in a window that they are too late for.
   
   State & timers are a good way to do this, with a buffer that accepts all 
elements and them emits when the timer fires. It is easy to label the elements 
as to whether they "should" be in this timer firing or they arrived too late 
for their timer firing.
   
   The trouble is that the user wants to do this with WriteFiles 
withWindowedWrites, which depends on the window mechanism. I would say that the 
problem is this mismatch. Their use case is easy to express with state & timers 
and does not fit well with windows, but the file sink is tightly coupled to 
windowing.
   
   @scwhittle if you do the thing with state & timers and then just assign each 
output to the window for that timer's timestamp, does that work? I suppose 
there is likely a problem with large iterable elements being too inefficient?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] kennknowles commented on issue #23379: [Feature Request]: Expose TimerStateInternals.currentOutputWatermarkTime to allow for DoFns to handle elements behind the watemark differently

Reply via email to