I'm proposing to get this small change into 1.1:
https://issues.apache.org/jira/browse/FLINK-4239 This will make our lives
easier with the future proposed changes.

What do you think?
On Tue, 19 Jul 2016 at 11:41 Aljoscha Krettek <aljos...@apache.org> wrote:

> Hi,
> these new features should make it into the 1.2 release. We are already
> working on releasing 1.1 so it won't make it for that one. unfortunately.
>
> Cheers,
> Aljoscha
>
> On Mon, 18 Jul 2016 at 23:19 Chen Qin <qinnc...@gmail.com> wrote:
>
>> BTW, do you have rough timeline in term of roll out it to production?
>>
>> Thanks,
>> Chen
>>
>>
>> On Mon, Jul 18, 2016 at 2:46 AM, Aljoscha Krettek <aljos...@apache.org>
>> wrote:
>>
>> > Hi,
>> > Chen commented this on the doc (I'm mirroring here so everyone can
>> follow):
>> > "It would be cool to be able to access last snapshot of window states
>> > before it get purged. Pipeline author might consider put it to external
>> > storage and deal with late arriving events by restore corresponding
>> > window."
>> >
>> > My answer:
>> > This is partially covered by the section called "What Happens at
>> > Window-Cleanup Time, Who Decides When to Purge". What I want to
>> introduce
>> > is that the window can have one final emission if there is new data in
>> the
>> > buffers at cleanup time.
>> >
>> > The work on this will also depend on this proposal:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending+Window+Function+Metadata
>> > With
>> > this, the WindowFunction can get meta data about the window firing so it
>> > could be informed that this is the last firing before a cleanup and that
>> > there already was an earlier, on-time firing.
>> >
>> > Does this cover your concerns, Chen?
>> >
>> > Cheers,
>> > Aljoscha
>> >
>> > On Sun, 10 Jul 2016 at 21:24 Chen Qin <qinnc...@gmail.com> wrote:
>> >
>> > > Sure. Currently, it looks like any element assigned to a too late
>> window
>> > > will be dropped silently😓 ?
>> > >
>> > > Having a late window stream imply somehow Flink needs to add one more
>> > state
>> > > to window and split window state cleanup from window retirement.
>> > > I would suggest as simple as adding a function in trigger called
>> > > OnLateElement and always fire_purge it would enable developer aware
>> and
>> > > handle this case.
>> > >
>> > > Chen
>> > >
>> > >
>> > >
>> > > On Fri, Jul 8, 2016 at 1:00 AM, Aljoscha Krettek <aljos...@apache.org
>> >
>> > > wrote:
>> > >
>> > > > @Chen I added a section at the end of the document regarding access
>> to
>> > > the
>> > > > elements that are dropped as late. Right now, the section just
>> mentions
>> > > > that we have to do this but there is no real proposal yet for how
>> to do
>> > > it.
>> > > > Only a rough sketch so that we don't forget about it.
>> > > >
>> > > > On Fri, 8 Jul 2016 at 07:47 Chen Qin <qinnc...@gmail.com> wrote:
>> > > >
>> > > > > +1 for allowedLateness scenario.
>> > > > >
>> > > > > The rationale behind is there are backfills or data issues hold
>> > > in-window
>> > > > > data till watermark pass end time. It cause sink write partial
>> > output.
>> > > > >
>> > > > > Allow high allowedLateness threshold makes life easier to merge
>> those
>> > > > > results and overwrite partial output with correct output at sink.
>> But
>> > > > yeah,
>> > > > > pipeline author is at risk of blow up statebackend with huge
>> states.
>> > > > >
>> > > > > Alternatively, In some case, if sink allows read-check-merge
>> > operation,
>> > > > > window can explicit call out events ingested after
>> allowedLateness.
>> > It
>> > > > asks
>> > > > > pipeline author mitigated these events in a way outside of flink
>> > > > ecosystem.
>> > > > > The catch is that since everywhere in a flink job can replay and
>> > > > > checkpoint, notification should somehow includes these info as
>> well.
>> > > > >
>> > > > > Thanks
>> > > > > Chen
>> > > > >
>> > > > > On Thu, Jul 7, 2016 at 12:14 AM, Kostas Kloudas <
>> > > > > k.klou...@data-artisans.com
>> > > > > > wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > In the effort to move the discussion to the mailing list, rather
>> > than
>> > > > the
>> > > > > > doc,
>> > > > > > there was a comment in the doc:
>> > > > > >
>> > > > > > “It seems this proposal marries the allowed lateness of events
>> and
>> > > the
>> > > > > > discarding of window state. In most use cases this should be
>> > > > sufficient,
>> > > > > > but there are instances where having independent control of
>> these
>> > may
>> > > > be
>> > > > > > useful.
>> > > > > >
>> > > > > > For instance, you may have a job that computes some aggregate,
>> > like a
>> > > > > sum.
>> > > > > > You may want to keep the window state around for a while, but
>> not
>> > too
>> > > > > long.
>> > > > > > Yet you may want to continue processing late events after you
>> > > discarded
>> > > > > the
>> > > > > > window state. It is possible that your stream sinks can make
>> use of
>> > > > this
>> > > > > > data. For instance, they may be writing to a data store that
>> > returns
>> > > an
>> > > > > > error if a row already exists, which allow the sink to read the
>> > > > existing
>> > > > > > row and update it with the new data."
>> > > > > >
>> > > > > > To which I would like to reply:
>> > > > > >
>> > > > > > If I understand your use-case correctly, I believe that the
>> > proposed
>> > > > > > binding of the allowed lateness to the state purging does not
>> > impose
>> > > > any
>> > > > > > problem. The lateness specifies the upper time bound, after
>> which
>> > the
>> > > > > state
>> > > > > > will be discarded. Between the start of a window and its (end +
>> > > > > > allowedLateness) you can write custom triggers that fire, purge
>> the
>> > > > > state,
>> > > > > > or do nothing. Given this, I suppose that, at the most extreme
>> > case,
>> > > > you
>> > > > > > can specify an allowed lateness of Long.MaxValue and do the
>> purging
>> > > of
>> > > > > the
>> > > > > > state "manually". By doing this, you remove the safeguard of
>> > letting
>> > > > the
>> > > > > > system purge the state at some point in time, and you can do
>> your
>> > own
>> > > > > > custom state management that fits your needs.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Kostas
>> > > > > >
>> > > > > > > On Jul 6, 2016, at 5:43 PM, Aljoscha Krettek <
>> > aljos...@apache.org>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > @Vishnu Funny you should ask that because I have a design doc
>> > lying
>> > > > > > around.
>> > > > > > > I'll open a new mail thread to not hijack this one.
>> > > > > > >
>> > > > > > > On Wed, 6 Jul 2016 at 17:17 Vishnu Viswanath <
>> > > > > > vishnu.viswanat...@gmail.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > >> Hi,
>> > > > > > >>
>> > > > > > >> I was going through the suggested improvements in window,
>> and I
>> > > have
>> > > > > > >> few questions/suggestion on improvement regarding the
>> Evictor.
>> > > > > > >>
>> > > > > > >> 1) I am having a use case where I have to create a custom
>> > Evictor
>> > > > that
>> > > > > > will
>> > > > > > >> evict elements from the window based on the value (e.g., if I
>> > have
>> > > > > > elements
>> > > > > > >> are of case class Item(id: Int, type:String) then evict
>> elements
>> > > > that
>> > > > > > has
>> > > > > > >> type="a"). I believe this is not currently possible.
>> > > > > > >> 2) this is somewhat related to 1) where there should be an
>> > option
>> > > to
>> > > > > > evict
>> > > > > > >> elements from anywhere in the window. not only from the
>> > beginning
>> > > of
>> > > > > the
>> > > > > > >> window. (e.g., apply the delta function to all elements and
>> > remove
>> > > > all
>> > > > > > >> those don't pass. I checked the code and evict method just
>> > returns
>> > > > the
>> > > > > > >> number of elements to be removed and processTriggerResult
>> just
>> > > skips
>> > > > > > those
>> > > > > > >> many elements from the beginning.
>> > > > > > >> 3) Add an option to enables the user to decide if the
>> eviction
>> > > > should
>> > > > > > >> happen before the apply function or after the apply function.
>> > > > > Currently
>> > > > > > it
>> > > > > > >> is before the apply function, but I have a use case where I
>> need
>> > > to
>> > > > > > first
>> > > > > > >> apply the function and evict afterward.
>> > > > > > >>
>> > > > > > >> I am doing these for a POC so I think I can modify the flink
>> > code
>> > > > base
>> > > > > > to
>> > > > > > >> make these changes and build, but I would appreciate any
>> > > suggestion
>> > > > on
>> > > > > > >> whether these are viable changes or will there any
>> performance
>> > > issue
>> > > > > if
>> > > > > > >> these are done. Also any pointer on where to start(e.g, do I
>> > > create
>> > > > a
>> > > > > > new
>> > > > > > >> class similar to EvictingWindowOperator that extends
>> > > > WindowOperator?)
>> > > > > > >>
>> > > > > > >> Thanks and Regards,
>> > > > > > >> Vishnu Viswanath,
>> > > > > > >>
>> > > > > > >> On Wed, Jul 6, 2016 at 9:39 AM, Aljoscha Krettek <
>> > > > aljos...@apache.org
>> > > > > >
>> > > > > > >> wrote:
>> > > > > > >>
>> > > > > > >>> I did:
>> > > > > > >>>
>> > > > > > >>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3ccanmxww0abttjjg9ewdxrugxkjm7jscbenmvrzohpt2qo3pq...@mail.gmail.com%3e
>> > > > > > >>> ;-)
>> > > > > > >>>
>> > > > > > >>> On Wed, 6 Jul 2016 at 15:31 Ufuk Celebi <u...@apache.org>
>> > wrote:
>> > > > > > >>>
>> > > > > > >>>> On Wed, Jul 6, 2016 at 3:19 PM, Aljoscha Krettek <
>> > > > > aljos...@apache.org
>> > > > > > >
>> > > > > > >>>> wrote:
>> > > > > > >>>>> In the future, it might be good to to discussions
>> directly on
>> > > the
>> > > > > ML
>> > > > > > >>> and
>> > > > > > >>>>> then change the document accordingly. This way everyone
>> can
>> > > > follow
>> > > > > > >> the
>> > > > > > >>>>> discussion on the ML. I also feel that Google Doc comments
>> > > often
>> > > > > > >> don't
>> > > > > > >>>> give
>> > > > > > >>>>> enough space for expressing more complex opinions.
>> > > > > > >>>>
>> > > > > > >>>> I agree! Would you mind raising this point as a separate
>> > > > discussion
>> > > > > on
>> > > > > > >>> dev@
>> > > > > > >>>> ?
>> > > > > > >>>>
>> > > > > > >>>
>> > > > > > >>
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to