Ohhh nice! Would be great if you can share us some code soon. It is indeed a very complicated problem and there is probably no single solution that fits all usecases. So having one way of doing things would be a great reference. Looking forward to that!
On Wed, Jan 28, 2015 at 4:52 PM, Tobias Pfeiffer <t...@preferred.jp> wrote: > Hi, > > On Thu, Jan 29, 2015 at 1:54 AM, YaoPau <jonrgr...@gmail.com> wrote: >> >> My thinking is to maintain state in an RDD and update it an persist it >> with >> each 2-second pass, but this also seems like it could get messy. Any >> thoughts or examples that might help me? > > > I have just implemented some timestamp-based windowing on DStreams (can't > share the code now, but will be published a couple of months ahead), > although with the assumption that items are in correct order. The main > challenge (rather technical) was to keep proper state across RDD boundaries > and to tell the state "you can mark this partial window from the last > interval as 'complete' now" without shuffling too much data around. For > example, if there are some empty intervals, you don't know when the next > item to go into the partial window will arrive, or if there will be one at > all. I guess if you want to have out-of-order tolerance, that will become > even trickier, as you need to define and think about some timeout for > partial windows in your state... > > Tobias > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org