[jira] [Commented] (FLINK-22587) Support aggregations in batch mode with DataStream API

Etienne Chauchot (Jira) Thu, 17 Jun 2021 06:08:32 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-22587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364931#comment-17364931
 ]


Etienne Chauchot commented on FLINK-22587:
------------------------------------------

[~sjwiesman] FYI here is the email I sent to the ML in case the manual join is 
useful to people: 

https://lists.apache.org/thread.html/rcb31ee69c0ea1bcdaede27ae783d331c8cb137b0740444fa20b2c9c1%40%3Cdev.flink.apache.org%3E

> Support aggregations in batch mode with DataStream API
> ------------------------------------------------------
>
>                 Key: FLINK-22587
>                 URL: https://issues.apache.org/jira/browse/FLINK-22587
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream
>    Affects Versions: 1.12.0, 1.13.0
>            Reporter: Etienne Chauchot
>            Priority: Major
>
> A pipeline like this *in batch mode* would output no data
> {code:java}
> stream.join(otherStream)
>     .where(<KeySelector>)
>     .equalTo(<KeySelector>)
>     .window(GlobalWindows.create())
>     .apply(<JoinFunction>)
> {code}
> Indeed the default trigger for GlobalWindow is NeverTrigger which never 
> fires. If we set a _EventTimeTrigger_ it will fire with every element as the 
> watermark will be set to +INF (batch mode) and will pass the end of the 
> global window with each new element. A _ProcessingTimeTrigger_ never fires 
> either and all elapsed time or delta based triggers would not be suited for 
> batch.
> Same goes for _reduce()_ instead of join().
> So I guess we miss something for batch support with DataStream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-22587) Support aggregations in batch mode with DataStream API

Reply via email to