Ohhh nice! Would be great if you can share us some code soon. It is
indeed a very complicated problem and there is probably no single
solution that fits all usecases. So having one way of doing things
would be a great reference. Looking forward to that!

On Wed, Jan 28, 2015 at 4:52 PM, Tobias Pfeiffer <t...@preferred.jp> wrote:
> Hi,
>
> On Thu, Jan 29, 2015 at 1:54 AM, YaoPau <jonrgr...@gmail.com> wrote:
>>
>> My thinking is to maintain state in an RDD and update it an persist it
>> with
>> each 2-second pass, but this also seems like it could get messy.  Any
>> thoughts or examples that might help me?
>
>
> I have just implemented some timestamp-based windowing on DStreams (can't
> share the code now, but will be published a couple of months ahead),
> although with the assumption that items are in correct order. The main
> challenge (rather technical) was to keep proper state across RDD boundaries
> and to tell the state "you can mark this partial window from the last
> interval as 'complete' now" without shuffling too much data around. For
> example, if there are some empty intervals, you don't know when the next
> item to go into the partial window will arrive, or if there will be one at
> all. I guess if you want to have out-of-order tolerance, that will become
> even trickier, as you need to define and think about some timeout for
> partial windows in your state...
>
> Tobias
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to