Thanks for the responses, much appreciated. I will continue to experiment. Ben
On Sun Feb 15 2015 at 21:22:17 Julian Hyde <jul...@hydromatic.net> wrote: > +1 > > As far as possible, behavior should be deterministic, that is, determined > by the data rather than when the query was started or the arrival time of > the data. > > Of course, for the query to make progress, there should be ways to discard > late data and to indicate that a producer is alive but doesn't have any > data to send for a particular time period. But for normal operation, a > slight change in record arrival time or relative order of records from > different producers should not radically change the output. > > We've been having discussions about SQL support for rolling, paged and > tumbling windows. We'll be able to trigger emission of rows at the top of > the hour, based on the time stamp of the data, and other intervals. > Punctuation will allow timely emission even if there is no data flowing. > > Julian > > > On Feb 15, 2015, at 10:51, Benjamin Edwards <edwards.b...@gmail.com> > wrote: > > > > Hi > > > > Based on what I can see in the run loop class, there are a few things > that > > seem a little problematic for windowed processing with respect to time: > > > > 1) No ability to schedule *when* on an interval you might start. For > > instance, if you wanted to process a window on the hour, every hour, > there > > is no way to do this. > > > > 2) You don't get passed the time. I guess this is simply due to the fact > > that the window isn't really trying to keep up, or pin itself to a given > > phase. If you get behind, well tough. You just added some phase to your > > series. > > > > What do people normally do to mitigate this? I was thinking that rather > > than using the Windowed task I would simply have the producer use a timer > > and once a period send a control message with the time stamp. This would > > indicate to my task that period was up and state should be flushed to db, > > aggregated to another stream etc.. > > > > Note that I am not trying to do real time processing with hard > constraints, > > or anything like that, I just need things that mostly happened within a > > given frame to get grouped and most importantly for things to happen "on > > the minute" or "on the hour" etc. > > > > Ben >