kennknowles commented on issue #23379: URL: https://github.com/apache/beam/issues/23379#issuecomment-1263777770
OK this is all fine. So a common case will be that a large number of windows are processed simultaneously, and then the watermark makes a large jump (in batch from -inf to +inf, but it can be any large jump) causing them all to be completed. My main point is that we need to avoid mental models that think of them being processed one after the other. There is no "next" window, and the idea of there being a "latest open window" is not very useful. I think that your points are all fine, but the goal here is still not obviously doable. The user basically wants fixed windows that are processed "one after the other" and for elements to be assigned to the "current" window with just a flag that indicates whether they really belong to that window or whether they "should have" been put in a window that they are too late for. State & timers are a good way to do this, with a buffer that accepts all elements and them emits when the timer fires. It is easy to label the elements as to whether they "should" be in this timer firing or they arrived too late for their timer firing. The trouble is that the user wants to do this with WriteFiles withWindowedWrites, which depends on the window mechanism. I would say that the problem is this mismatch. Their use case is easy to express with state & timers and does not fit well with windows, but the file sink is tightly coupled to windowing. @scwhittle if you do the thing with state & timers and then just assign each output to the window for that timer's timestamp, does that work? I suppose there is likely a problem with large iterable elements being too inefficient? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
