Another complicating issue is the practical requirement that often comes up
that aggregates for late arriving data be kept separate for the same window
that arrived on time.  This allows late arriving aggregates to be reported
separately.  This is a fundamental change in the meaning of windowed
aggregates, of course, but it is also a common requirement.



On Mon, Jun 29, 2015 at 7:25 AM, Milinda Pathirage <[email protected]>
wrote:

> Hi Ted,
>
> We have discussed most of the complexities related to window handling in a
> different thread [1]. My bad that I didn't provide those additional details
> when I started this thread. We have a window store (implemented on top of
> Samza's local storage) to keep track of old windows to trigger new results
> for late arrivals. Document [2]  discusses most of the things related to
> window store's design.
>
> Thanks
> Milinda
>
> [1] https://issues.apache.org/jira/browse/SAMZA-552
> [2]
>
> https://issues.apache.org/jira/secure/attachment/12708934/DESIGN-SAMZA-552-7.pdf
>
> On Sun, Jun 28, 2015 at 2:21 AM, Ted Dunning <[email protected]>
> wrote:
>
> > Here is the biggest recent thread on this.  You might also ask directly
> > what they think about the algebraic issue as you see it.
> >
> >
> >
> >
> https://mail-archives.apache.org/mod_mbox/flink-dev/201506.mbox/%3CCANMXwW3bOgaJhG_syH2%3D0x5BcdukyTOF0dU3dM4_3yQK2UHoyw%40mail.gmail.com%3E
> >
> > Here are some thoughts that mostly deal with implementation, but also
> > discuss a few theoretical aspects.  These then link into concepts such as
> > data types (Flink recognized sortedness in type information, for
> instance),
> > the snaphost algorithms (because window triggers are very similar to the
> > Lamport/Chandry algorithms used for snapshots and state handling), the
> > optimizer (only a side comment in this regard) and other aspects.
> >
> >
> >
> https://docs.google.com/document/d/1rSoHyhUhm2IE30o5tkR8GEetjFvMRMNxvsCfoPsW6_4/edit#heading=h.faju7vv5ilgm
> >
> > On Sun, Jun 28, 2015 at 12:48 AM, Julian Hyde <[email protected]> wrote:
> >
> > > Ted,
> > >
> > > Do you have a link to a pertinent email thread from the Flink list?
> > >
> > > I can see how shifting from monotonic to k-sorted or punctuation could
> > > make a big impact to the runtime of a streaming system like Flink. But
> I
> > > don’t think the impact on the algebra is as big, and that’s what we’re
> > > concerned with in Calcite.
> > >
> > > Julian
> > >
> > >
> > > > On Jun 26, 2015, at 11:18 PM, Ted Dunning <[email protected]>
> > wrote:
> > > >
> > > > On Sat, Jun 27, 2015 at 1:13 AM, Julian Hyde <[email protected]>
> wrote:
> > > >
> > > >> Algebraic reasoning based on monotonicity can be extended to the
> other
> > > >> models. If we start with the more complex models we'd soon we up to
> > > >> our hubcaps in theoretical mud.
> > > >>
> > > >
> > > > As you like.  Flink has just had to rip up and repair a bunch of
> stuff
> > > > precisely because they started with an assumption of monotonicity and
> > had
> > > > to move to a looser model.  The practical impact was pretty
> substantial
> > > and
> > > > substantially larger than the comments here would imply.
> > >
> > >
> >
>
>
>
> --
> Milinda Pathirage
>
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
>
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org
>

Reply via email to