Sure. Currently, it looks like any element assigned to a too late window will be dropped silently😓 ?
Having a late window stream imply somehow Flink needs to add one more state to window and split window state cleanup from window retirement. I would suggest as simple as adding a function in trigger called OnLateElement and always fire_purge it would enable developer aware and handle this case. Chen On Fri, Jul 8, 2016 at 1:00 AM, Aljoscha Krettek <aljos...@apache.org> wrote: > @Chen I added a section at the end of the document regarding access to the > elements that are dropped as late. Right now, the section just mentions > that we have to do this but there is no real proposal yet for how to do it. > Only a rough sketch so that we don't forget about it. > > On Fri, 8 Jul 2016 at 07:47 Chen Qin <qinnc...@gmail.com> wrote: > > > +1 for allowedLateness scenario. > > > > The rationale behind is there are backfills or data issues hold in-window > > data till watermark pass end time. It cause sink write partial output. > > > > Allow high allowedLateness threshold makes life easier to merge those > > results and overwrite partial output with correct output at sink. But > yeah, > > pipeline author is at risk of blow up statebackend with huge states. > > > > Alternatively, In some case, if sink allows read-check-merge operation, > > window can explicit call out events ingested after allowedLateness. It > asks > > pipeline author mitigated these events in a way outside of flink > ecosystem. > > The catch is that since everywhere in a flink job can replay and > > checkpoint, notification should somehow includes these info as well. > > > > Thanks > > Chen > > > > On Thu, Jul 7, 2016 at 12:14 AM, Kostas Kloudas < > > k.klou...@data-artisans.com > > > wrote: > > > > > Hi, > > > > > > In the effort to move the discussion to the mailing list, rather than > the > > > doc, > > > there was a comment in the doc: > > > > > > “It seems this proposal marries the allowed lateness of events and the > > > discarding of window state. In most use cases this should be > sufficient, > > > but there are instances where having independent control of these may > be > > > useful. > > > > > > For instance, you may have a job that computes some aggregate, like a > > sum. > > > You may want to keep the window state around for a while, but not too > > long. > > > Yet you may want to continue processing late events after you discarded > > the > > > window state. It is possible that your stream sinks can make use of > this > > > data. For instance, they may be writing to a data store that returns an > > > error if a row already exists, which allow the sink to read the > existing > > > row and update it with the new data." > > > > > > To which I would like to reply: > > > > > > If I understand your use-case correctly, I believe that the proposed > > > binding of the allowed lateness to the state purging does not impose > any > > > problem. The lateness specifies the upper time bound, after which the > > state > > > will be discarded. Between the start of a window and its (end + > > > allowedLateness) you can write custom triggers that fire, purge the > > state, > > > or do nothing. Given this, I suppose that, at the most extreme case, > you > > > can specify an allowed lateness of Long.MaxValue and do the purging of > > the > > > state "manually". By doing this, you remove the safeguard of letting > the > > > system purge the state at some point in time, and you can do your own > > > custom state management that fits your needs. > > > > > > Thanks, > > > Kostas > > > > > > > On Jul 6, 2016, at 5:43 PM, Aljoscha Krettek <aljos...@apache.org> > > > wrote: > > > > > > > > @Vishnu Funny you should ask that because I have a design doc lying > > > around. > > > > I'll open a new mail thread to not hijack this one. > > > > > > > > On Wed, 6 Jul 2016 at 17:17 Vishnu Viswanath < > > > vishnu.viswanat...@gmail.com> > > > > wrote: > > > > > > > >> Hi, > > > >> > > > >> I was going through the suggested improvements in window, and I have > > > >> few questions/suggestion on improvement regarding the Evictor. > > > >> > > > >> 1) I am having a use case where I have to create a custom Evictor > that > > > will > > > >> evict elements from the window based on the value (e.g., if I have > > > elements > > > >> are of case class Item(id: Int, type:String) then evict elements > that > > > has > > > >> type="a"). I believe this is not currently possible. > > > >> 2) this is somewhat related to 1) where there should be an option to > > > evict > > > >> elements from anywhere in the window. not only from the beginning of > > the > > > >> window. (e.g., apply the delta function to all elements and remove > all > > > >> those don't pass. I checked the code and evict method just returns > the > > > >> number of elements to be removed and processTriggerResult just skips > > > those > > > >> many elements from the beginning. > > > >> 3) Add an option to enables the user to decide if the eviction > should > > > >> happen before the apply function or after the apply function. > > Currently > > > it > > > >> is before the apply function, but I have a use case where I need to > > > first > > > >> apply the function and evict afterward. > > > >> > > > >> I am doing these for a POC so I think I can modify the flink code > base > > > to > > > >> make these changes and build, but I would appreciate any suggestion > on > > > >> whether these are viable changes or will there any performance issue > > if > > > >> these are done. Also any pointer on where to start(e.g, do I create > a > > > new > > > >> class similar to EvictingWindowOperator that extends > WindowOperator?) > > > >> > > > >> Thanks and Regards, > > > >> Vishnu Viswanath, > > > >> > > > >> On Wed, Jul 6, 2016 at 9:39 AM, Aljoscha Krettek < > aljos...@apache.org > > > > > > >> wrote: > > > >> > > > >>> I did: > > > >>> > > > >>> > > > >> > > > > > > https://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3ccanmxww0abttjjg9ewdxrugxkjm7jscbenmvrzohpt2qo3pq...@mail.gmail.com%3e > > > >>> ;-) > > > >>> > > > >>> On Wed, 6 Jul 2016 at 15:31 Ufuk Celebi <u...@apache.org> wrote: > > > >>> > > > >>>> On Wed, Jul 6, 2016 at 3:19 PM, Aljoscha Krettek < > > aljos...@apache.org > > > > > > > >>>> wrote: > > > >>>>> In the future, it might be good to to discussions directly on the > > ML > > > >>> and > > > >>>>> then change the document accordingly. This way everyone can > follow > > > >> the > > > >>>>> discussion on the ML. I also feel that Google Doc comments often > > > >> don't > > > >>>> give > > > >>>>> enough space for expressing more complex opinions. > > > >>>> > > > >>>> I agree! Would you mind raising this point as a separate > discussion > > on > > > >>> dev@ > > > >>>> ? > > > >>>> > > > >>> > > > >> > > > > > > > > >