[
https://issues.apache.org/jira/browse/FLINK-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949717#comment-15949717
]
ASF GitHub Bot commented on FLINK-5654:
---------------------------------------
Github user rtudoran commented on the issue:
https://github.com/apache/flink/pull/3641
@fhueske
Thanks for the feedback - i can of course do the modifications you
mentioned. However, I do not believe that is the correct behavior (or better
said not for all the needed cases). From my understanding of the semantic for
OVER - even if the function would work on the time (proctime/eventtime) - we
should still emit a value for every incoming event. I have 2 arguments for this:
1) in our scenarios - we would use the current implementation for example
to detect certain statistics for every incoming event, while the statistic
focus is defined for a certain period of time (this is a functionality that is
highly needed). For example if you apply this in a stock market scenario - you
might want to say give me the sum of the transactions over the last hour (to
verify potentially a threshold for liquidity of the market) and as the
application would need to react on each incoming transaction (e.g. decide to
buy or not to buy) - then working on the behavior you mentioned would not
enable such a scenario. More than this, even if you would need both behaviors
..then what query could you write to have the described behavior and make the
differentiation from the other?
2) if you think on the case of event time - which should be similar with
proctime - then there it should be the same. When you get an event (ev1 ,
time1) - you should not emit this output until you would know if there is at
some point later another event with the same event time (ev2, time1). Basically
you would register the timer for the acceptable watermark/allowedlatency and
accumulate the accumulator for a specific event time and emit it after the
allowed latency has passed....is this the actual behavior that is implemented /
would be implemented?
> Add processing time OVER RANGE BETWEEN x PRECEDING aggregation to SQL
> ---------------------------------------------------------------------
>
> Key: FLINK-5654
> URL: https://issues.apache.org/jira/browse/FLINK-5654
> Project: Flink
> Issue Type: Sub-task
> Components: Table API & SQL
> Reporter: Fabian Hueske
> Assignee: radu
>
> The goal of this issue is to add support for OVER RANGE aggregations on
> processing time streams to the SQL interface.
> Queries similar to the following should be supported:
> {code}
> SELECT
> a,
> SUM(b) OVER (PARTITION BY c ORDER BY procTime() RANGE BETWEEN INTERVAL '1'
> HOUR PRECEDING AND CURRENT ROW) AS sumB,
> MIN(b) OVER (PARTITION BY c ORDER BY procTime() RANGE BETWEEN INTERVAL '1'
> HOUR PRECEDING AND CURRENT ROW) AS minB
> FROM myStream
> {code}
> The following restrictions should initially apply:
> - All OVER clauses in the same SELECT clause must be exactly the same.
> - The PARTITION BY clause is optional (no partitioning results in single
> threaded execution).
> - The ORDER BY clause may only have procTime() as parameter. procTime() is a
> parameterless scalar function that just indicates processing time mode.
> - UNBOUNDED PRECEDING is not supported (see FLINK-5657)
> - FOLLOWING is not supported.
> The restrictions will be resolved in follow up issues. If we find that some
> of the restrictions are trivial to address, we can add the functionality in
> this issue as well.
> This issue includes:
> - Design of the DataStream operator to compute OVER ROW aggregates
> - Translation from Calcite's RelNode representation (LogicalProject with
> RexOver expression).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)