[jira] [Commented] (FLINK-5653) Add processing time OVER ROWS BETWEEN x PRECEDING aggregation to SQL

ASF GitHub Bot (JIRA) Mon, 27 Mar 2017 02:27:07 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942922#comment-15942922
 ]


ASF GitHub Bot commented on FLINK-5653:
---------------------------------------

Github user huawei-flink commented on the issue:

    https://github.com/apache/flink/pull/3574
  
    @fhueske Thanks a lot of the clarification. I understand the issue better 
now, and see your attempt to make an average case that would work for both in 
memory as well as on external persistence. Considering RocksDB as the state of 
art, your choice sounds much more reasonable. We are well aware of the costs of 
serialization, and the impact is definitely important.  However, low latency 
systems with strict SLA will likely run just in memory. 
    
    The O(n) of the MapState is granted by the fact that time is monothonic and 
therefore the sequential reading is managed by the key timestamp. The cost of 
each O(1) in the hashmap increseas with the size of the window thou as you need 
to search through the map index. We definitely need better data access patterns 
for the state of "time series" types of data. 
    
    I will try to internalize it and provide the MapState implementation
    



> Add processing time OVER ROWS BETWEEN x PRECEDING aggregation to SQL
> --------------------------------------------------------------------
>
>                 Key: FLINK-5653
>                 URL: https://issues.apache.org/jira/browse/FLINK-5653
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: Stefano Bortoli
>
> The goal of this issue is to add support for OVER ROWS aggregations on 
> processing time streams to the SQL interface.
> Queries similar to the following should be supported:
> {code}
> SELECT 
>   a, 
>   SUM(b) OVER (PARTITION BY c ORDER BY procTime() ROWS BETWEEN 2 PRECEDING 
> AND CURRENT ROW) AS sumB,
>   MIN(b) OVER (PARTITION BY c ORDER BY procTime() ROWS BETWEEN 2 PRECEDING 
> AND CURRENT ROW) AS minB
> FROM myStream
> {code}
> The following restrictions should initially apply:
> - All OVER clauses in the same SELECT clause must be exactly the same.
> - The PARTITION BY clause is optional (no partitioning results in single 
> threaded execution).
> - The ORDER BY clause may only have procTime() as parameter. procTime() is a 
> parameterless scalar function that just indicates processing time mode.
> - UNBOUNDED PRECEDING is not supported (see FLINK-5656)
> - FOLLOWING is not supported.
> The restrictions will be resolved in follow up issues. If we find that some 
> of the restrictions are trivial to address, we can add the functionality in 
> this issue as well.
> This issue includes:
> - Design of the DataStream operator to compute OVER ROW aggregates
> - Translation from Calcite's RelNode representation (LogicalProject with 
> RexOver expression).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5653) Add processing time OVER ROWS BETWEEN x PRECEDING aggregation to SQL

Reply via email to