[jira] [Commented] (FLINK-5653) Add processing time OVER ROWS BETWEEN x PRECEDING aggregation to SQL

ASF GitHub Bot (JIRA) Thu, 16 Mar 2017 02:20:08 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927713#comment-15927713
 ]


ASF GitHub Bot commented on FLINK-5653:
---------------------------------------

Github user huawei-flink commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3547#discussion_r106369028
  
    --- Diff: 
flink-libraries/flink-table/src/test/java/org/apache/flink/table/api/java/stream/sql/ProcTimeRowStreamAggregationSqlITCase.java
 ---
    @@ -0,0 +1,317 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.table.api.java.stream.sql;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.flink.api.java.tuple.Tuple5;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase;
    +import org.apache.flink.table.api.Table;
    +import org.apache.flink.table.api.TableEnvironment;
    +import org.apache.flink.table.api.java.StreamTableEnvironment;
    +import org.apache.flink.table.api.java.stream.utils.StreamTestData;
    +import org.apache.flink.table.api.scala.stream.utils.StreamITCase;
    +import org.apache.flink.types.Row;
    +import org.junit.Ignore;
    +import org.junit.Test;
    +
    +public class ProcTimeRowStreamAggregationSqlITCase extends 
StreamingMultipleProgramsTestBase {
    +
    +   
    +   @Test
    +   public void testMaxAggregatation() throws Exception {
    +           StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
    +           StreamTableEnvironment tableEnv = 
TableEnvironment.getTableEnvironment(env);
    +           StreamITCase.clear();
    +
    +           env.setParallelism(1);
    +           
    +           DataStream<Tuple5<Integer, Long, Integer, String, Long>> ds = 
StreamTestData.get5TupleDataStream(env);
    +           Table in = tableEnv.fromDataStream(ds, "a,b,c,d,e");
    +           tableEnv.registerTable("MyTable", in);
    +
    +           String sqlQuery = "SELECT a, MAX(c) OVER (PARTITION BY a ORDER 
BY procTime() ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS maxC FROM MyTable";
    +           Table result = tableEnv.sql(sqlQuery);
    +
    +           DataStream<Row> resultSet = tableEnv.toDataStream(result, 
Row.class);
    +           resultSet.addSink(new StreamITCase.StringSink());
    +           env.execute();
    +
    +           List<String> expected = new ArrayList<>();
    +           expected.add("1,0");
    +           expected.add("2,1");
    +           expected.add("2,2");
    +           expected.add("3,3");
    +           expected.add("3,4");
    +           expected.add("3,5");
    +           expected.add("4,6");
    +           expected.add("4,7");
    +           expected.add("4,8");
    +           expected.add("4,9");
    +           expected.add("5,10");
    +           expected.add("5,11");
    +           expected.add("5,12");
    +           expected.add("5,14");
    +           expected.add("5,14");
    +
    +           StreamITCase.compareWithList(expected);
    +   }
    +   
    +   @Test
    +   public void testMinAggregatation() throws Exception {
    +           StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
    +           StreamTableEnvironment tableEnv = 
TableEnvironment.getTableEnvironment(env);
    +           StreamITCase.clear();
    +
    +           env.setParallelism(1);
    +           
    +           DataStream<Tuple5<Integer, Long, Integer, String, Long>> ds = 
StreamTestData.get5TupleDataStream(env);
    +           Table in = tableEnv.fromDataStream(ds, "a,b,c,d,e");
    +           tableEnv.registerTable("MyTable", in);
    +
    +           String sqlQuery = "SELECT a, MIN(c) OVER (PARTITION BY a ORDER 
BY procTime() ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS maxC FROM MyTable";
    +           Table result = tableEnv.sql(sqlQuery);
    +
    +           DataStream<Row> resultSet = tableEnv.toDataStream(result, 
Row.class);
    +           resultSet.addSink(new StreamITCase.StringSink());
    +           env.execute();
    +
    +           List<String> expected = new ArrayList<>();
    +           expected.add("1,0");
    +           expected.add("2,1");
    +           expected.add("2,1");
    +           expected.add("3,3");
    +           expected.add("3,3");
    +           expected.add("3,4");
    +           expected.add("4,6");
    +           expected.add("4,6");
    +           expected.add("4,7");
    +           expected.add("4,8");
    +           expected.add("5,10");
    +           expected.add("5,10");
    +           expected.add("5,11");
    +           expected.add("5,12");
    +           expected.add("5,13");
    +
    +           StreamITCase.compareWithList(expected);
    +   }
    +   
    +   @Test
    +   public void testSumAggregatation() throws Exception {
    +           StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
    +           StreamTableEnvironment tableEnv = 
TableEnvironment.getTableEnvironment(env);
    +           StreamITCase.clear();
    +
    +           env.setParallelism(1);
    +           
    +           DataStream<Tuple5<Integer, Long, Integer, String, Long>> ds = 
StreamTestData.get5TupleDataStream(env);
    +           Table in = tableEnv.fromDataStream(ds, "a,b,c,d,e");
    +           tableEnv.registerTable("MyTable", in);
    +
    +           String sqlQuery = "SELECT a, SUM(c) OVER (PARTITION BY a ORDER 
BY procTime() ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS sumC FROM MyTable";
    +           Table result = tableEnv.sql(sqlQuery);
    +
    +           DataStream<Row> resultSet = tableEnv.toDataStream(result, 
Row.class);
    +           resultSet.addSink(new StreamITCase.StringSink());
    +           env.execute();
    +
    +           List<String> expected = new ArrayList<>();
    +           expected.add("1,0");
    +           expected.add("2,1");
    +           expected.add("2,3");
    +           expected.add("3,3");
    +           expected.add("3,7");
    +           expected.add("3,9");
    +           expected.add("4,6");
    +           expected.add("4,13");
    +           expected.add("4,15");
    +           expected.add("4,17");
    +           expected.add("5,10");
    +           expected.add("5,21");
    +           expected.add("5,23");
    +           expected.add("5,26");
    +           expected.add("5,27");
    --- End diff --
    
    Besides the fix you suggested, which is trivial (and thanks! :-)), it 
sounds a little odd to define a range with excluding boundaries. could you 
please point out the document in which this is discussed?
    



> Add processing time OVER ROWS BETWEEN x PRECEDING aggregation to SQL
> --------------------------------------------------------------------
>
>                 Key: FLINK-5653
>                 URL: https://issues.apache.org/jira/browse/FLINK-5653
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: Stefano Bortoli
>
> The goal of this issue is to add support for OVER ROWS aggregations on 
> processing time streams to the SQL interface.
> Queries similar to the following should be supported:
> {code}
> SELECT 
>   a, 
>   SUM(b) OVER (PARTITION BY c ORDER BY procTime() ROWS BETWEEN 2 PRECEDING 
> AND CURRENT ROW) AS sumB,
>   MIN(b) OVER (PARTITION BY c ORDER BY procTime() ROWS BETWEEN 2 PRECEDING 
> AND CURRENT ROW) AS minB
> FROM myStream
> {code}
> The following restrictions should initially apply:
> - All OVER clauses in the same SELECT clause must be exactly the same.
> - The PARTITION BY clause is optional (no partitioning results in single 
> threaded execution).
> - The ORDER BY clause may only have procTime() as parameter. procTime() is a 
> parameterless scalar function that just indicates processing time mode.
> - UNBOUNDED PRECEDING is not supported (see FLINK-5656)
> - FOLLOWING is not supported.
> The restrictions will be resolved in follow up issues. If we find that some 
> of the restrictions are trivial to address, we can add the functionality in 
> this issue as well.
> This issue includes:
> - Design of the DataStream operator to compute OVER ROW aggregates
> - Translation from Calcite's RelNode representation (LogicalProject with 
> RexOver expression).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5653) Add processing time OVER ROWS BETWEEN x PRECEDING aggregation to SQL

Reply via email to