[ 
https://issues.apache.org/jira/browse/FLINK-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949717#comment-15949717
 ] 

ASF GitHub Bot commented on FLINK-5654:
---------------------------------------

Github user rtudoran commented on the issue:

    https://github.com/apache/flink/pull/3641
  
    @fhueske 
    Thanks for the feedback - i can of course do the modifications you 
mentioned. However, I do not believe that is the correct behavior (or better 
said not for all the needed cases). From my understanding of the semantic for 
OVER - even if the function would work on the time (proctime/eventtime) - we 
should still emit a value for every incoming event. I have 2 arguments for this:
    1) in our scenarios - we would use the current implementation for example 
to detect certain statistics for every incoming event, while the statistic 
focus is defined for a certain period of time (this is a functionality that is 
highly needed). For example if you apply this in a stock market scenario - you 
might want to say give me the sum of the transactions over the last hour (to 
verify potentially a threshold for liquidity of the market) and as the 
application would need to react on each incoming transaction (e.g. decide to 
buy or not to buy) - then working on the behavior you mentioned would not 
enable such a scenario. More than this, even if you would need both behaviors 
..then what query could you write to have the described behavior and make the 
differentiation from the other?
    2) if you think on the case of event time - which should be similar with 
proctime - then there it should be the same. When you get an event (ev1 , 
time1) - you should not emit this output until you would know if there is at 
some point later another event with the same event time (ev2, time1). Basically 
you would register the timer for the acceptable watermark/allowedlatency and 
accumulate the accumulator for a specific event time and emit it after the 
allowed latency has passed....is this the actual behavior that is implemented / 
would be implemented? 


> Add processing time OVER RANGE BETWEEN x PRECEDING aggregation to SQL
> ---------------------------------------------------------------------
>
>                 Key: FLINK-5654
>                 URL: https://issues.apache.org/jira/browse/FLINK-5654
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: radu
>
> The goal of this issue is to add support for OVER RANGE aggregations on 
> processing time streams to the SQL interface.
> Queries similar to the following should be supported:
> {code}
> SELECT 
>   a, 
>   SUM(b) OVER (PARTITION BY c ORDER BY procTime() RANGE BETWEEN INTERVAL '1' 
> HOUR PRECEDING AND CURRENT ROW) AS sumB,
>   MIN(b) OVER (PARTITION BY c ORDER BY procTime() RANGE BETWEEN INTERVAL '1' 
> HOUR PRECEDING AND CURRENT ROW) AS minB
> FROM myStream
> {code}
> The following restrictions should initially apply:
> - All OVER clauses in the same SELECT clause must be exactly the same.
> - The PARTITION BY clause is optional (no partitioning results in single 
> threaded execution).
> - The ORDER BY clause may only have procTime() as parameter. procTime() is a 
> parameterless scalar function that just indicates processing time mode.
> - UNBOUNDED PRECEDING is not supported (see FLINK-5657)
> - FOLLOWING is not supported.
> The restrictions will be resolved in follow up issues. If we find that some 
> of the restrictions are trivial to address, we can add the functionality in 
> this issue as well.
> This issue includes:
> - Design of the DataStream operator to compute OVER ROW aggregates
> - Translation from Calcite's RelNode representation (LogicalProject with 
> RexOver expression).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to