[ 
https://issues.apache.org/jira/browse/FLINK-22587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341174#comment-17341174
 ] 

Yun Gao commented on FLINK-22587:
---------------------------------

Hi [~echauchot] very thanks for reporting this issue ! I tried the example you 
attached, and I found that the issue also exists in streaming mode. I checked 
the implementation of WindowOperator and it seems it do not deal with the 
pending windows at the end of stream, thus the remaining records are ignored. 
This issue would also exist for other kind of windows, e.g. the pending count 
windows without enough records received at the end of stream.

To solve this issue, one possible solution to me is to allow triggers to 
specify the expected actions on end of stream since we might not have enough 
information on users' expected actions. For all the existing triggers, we might 
returns PURGE for existing triggers and add a wrapper trigger to take specified 
actions on end of stream. Also we might need to distinguish between actual end 
of stream / drain and stop-with-savepoint, the latter one might be better leave 
the window as it is.

> Support aggregations in batch mode with DataStream API
> ------------------------------------------------------
>
>                 Key: FLINK-22587
>                 URL: https://issues.apache.org/jira/browse/FLINK-22587
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / DataStream
>            Reporter: Etienne Chauchot
>            Priority: Major
>
> A pipeline like this *in batch mode* would output no data
> {code:java}
> stream.join(otherStream)
>     .where(<KeySelector>)
>     .equalTo(<KeySelector>)
>     .window(GlobalWindows.create())
>     .apply(<JoinFunction>)
> {code}
> Indeed the default trigger for GlobalWindow is NeverTrigger which never 
> fires. If we set a _EventTimeTrigger_ it will fire with every element as the 
> watermark will be set to +INF (batch mode) and will pass the end of the 
> global window with each new element. A _ProcessingTimeTrigger_ never fires 
> either and all elapsed time or delta based triggers would not be suited for 
> batch.
> Same goes for _reduce()_ instead of join().
> So I guess we miss something for batch support with DataStream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to