[jira] [Commented] (FLINK-5047) Add sliding group-windows for batch tables

ASF GitHub Bot (JIRA) Mon, 06 Mar 2017 13:42:12 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898183#comment-15898183
 ]


ASF GitHub Bot commented on FLINK-5047:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3364#discussion_r104527336
  
    --- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/dataset/DataSetWindowAggregateITCase.scala
 ---
    @@ -169,4 +168,228 @@ class DataSetWindowAggregateITCase(configMode: 
TableConfigMode)
           .toDataSet[Row]
       }
     
    +  @Test(expected = classOf[UnsupportedOperationException])
    +  def testAllEventTimeSlidingGroupWindowOverCount(): Unit = {
    +    val env = ExecutionEnvironment.getExecutionEnvironment
    +    val tEnv = TableEnvironment.getTableEnvironment(env, config)
    +
    +    val table = env.fromCollection(data).toTable(tEnv, 'long, 'int, 
'string)
    +
    +    // Count sliding non-grouping window on event-time are currently not 
supported
    +    table
    +      .window(Slide over 2.rows every 2.rows on 'long as 'w)
    +      .groupBy('w)
    +      .select('int.count)
    +      .toDataSet[Row]
    +  }
    +
    +  @Test
    +  def testAllEventTimeSlidingGroupWindowOverTime(): Unit = {
    +    // please keep this test in sync with the DataStream variant
    +    val env = ExecutionEnvironment.getExecutionEnvironment
    +    val tEnv = TableEnvironment.getTableEnvironment(env)
    +
    +    val table = env.fromCollection(data).toTable(tEnv, 'long, 'int, 
'string)
    +
    +    val windowedTable = table
    +      .window(Slide over 5.milli every 2.milli on 'long as 'w)
    +      .groupBy('w)
    +      .select('int.count, 'w.start, 'w.end)
    +
    +    val expected =
    +      "1,1970-01-01 00:00:00.008,1970-01-01 00:00:00.013\n" +
    +      "1,1970-01-01 00:00:00.012,1970-01-01 00:00:00.017\n" +
    +      "1,1970-01-01 00:00:00.014,1970-01-01 00:00:00.019\n" +
    +      "1,1970-01-01 00:00:00.016,1970-01-01 00:00:00.021\n" +
    +      "2,1969-12-31 23:59:59.998,1970-01-01 00:00:00.003\n" +
    +      "2,1970-01-01 00:00:00.006,1970-01-01 00:00:00.011\n" +
    +      "3,1970-01-01 00:00:00.004,1970-01-01 00:00:00.009\n" +
    +      "4,1970-01-01 00:00:00.0,1970-01-01 00:00:00.005\n" +
    +      "4,1970-01-01 00:00:00.002,1970-01-01 00:00:00.007"
    +
    +    val results = windowedTable.toDataSet[Row].collect()
    +    TestBaseUtils.compareResultAsText(results.asJava, expected)
    +  }
    +
    +  @Test
    +  def testEventTimeSlidingGroupWindowOverTimeOverlappingFullPane(): Unit = 
{
    +    // please keep this test in sync with the DataStream variant
    +    val env = ExecutionEnvironment.getExecutionEnvironment
    +    val tEnv = TableEnvironment.getTableEnvironment(env)
    +
    +    val table = env.fromCollection(data).toTable(tEnv, 'long, 'int, 
'string)
    +
    +    val windowedTable = table
    +      .window(Slide over 10.milli every 5.milli on 'long as 'w)
    +      .groupBy('string, 'w)
    +      .select('string, 'int.count, 'w.start, 'w.end)
    +
    +    val expected =
    +      "Hallo,1,1969-12-31 23:59:59.995,1970-01-01 00:00:00.005\n" +
    +      "Hallo,1,1970-01-01 00:00:00.0,1970-01-01 00:00:00.01\n" +
    +      "Hello world,1,1970-01-01 00:00:00.0,1970-01-01 00:00:00.01\n" +
    +      "Hello world,1,1970-01-01 00:00:00.005,1970-01-01 00:00:00.015\n" +
    +      "Hello world,1,1970-01-01 00:00:00.01,1970-01-01 00:00:00.02\n" +
    +      "Hello world,1,1970-01-01 00:00:00.015,1970-01-01 00:00:00.025\n" +
    +      "Hello,1,1970-01-01 00:00:00.005,1970-01-01 00:00:00.015\n" +
    +      "Hello,2,1969-12-31 23:59:59.995,1970-01-01 00:00:00.005\n" +
    +      "Hello,3,1970-01-01 00:00:00.0,1970-01-01 00:00:00.01\n" +
    +      "Hi,1,1969-12-31 23:59:59.995,1970-01-01 00:00:00.005\n" +
    +      "Hi,1,1970-01-01 00:00:00.0,1970-01-01 00:00:00.01"
    +
    +    val results = windowedTable.toDataSet[Row].collect()
    +    TestBaseUtils.compareResultAsText(results.asJava, expected)
    +  }
    +
    +  @Test
    +  def testEventTimeSlidingGroupWindowOverTimeOverlappingSplitPane(): Unit 
= {
    +    // please keep this test in sync with the DataStream variant
    +    val env = ExecutionEnvironment.getExecutionEnvironment
    +    val tEnv = TableEnvironment.getTableEnvironment(env)
    +
    +    val table = env.fromCollection(data).toTable(tEnv, 'long, 'int, 
'string)
    +
    +    val windowedTable = table
    +      .window(Slide over 5.milli every 4.milli on 'long as 'w)
    +      .groupBy('string, 'w)
    +      .select('string, 'int.count, 'w.start, 'w.end)
    +
    +    val expected =
    +      "Hallo,1,1970-01-01 00:00:00.0,1970-01-01 00:00:00.005\n" +
    +      "Hello world,1,1970-01-01 00:00:00.004,1970-01-01 00:00:00.009\n" +
    +      "Hello world,1,1970-01-01 00:00:00.008,1970-01-01 00:00:00.013\n" +
    +      "Hello world,1,1970-01-01 00:00:00.012,1970-01-01 00:00:00.017\n" +
    +      "Hello world,1,1970-01-01 00:00:00.016,1970-01-01 00:00:00.021\n" +
    +      "Hello,2,1970-01-01 00:00:00.0,1970-01-01 00:00:00.005\n" +
    +      "Hello,2,1970-01-01 00:00:00.004,1970-01-01 00:00:00.009\n" +
    +      "Hi,1,1970-01-01 00:00:00.0,1970-01-01 00:00:00.005"
    +
    +    val results = windowedTable.toDataSet[Row].collect()
    +    TestBaseUtils.compareResultAsText(results.asJava, expected)
    +  }
    +
    +  @Test
    +  def testEventTimeSlidingGroupWindowOverTimeNonOverlappingFullPane(): 
Unit = {
    +    // please keep this test in sync with the DataStream variant
    +    val env = ExecutionEnvironment.getExecutionEnvironment
    +    val tEnv = TableEnvironment.getTableEnvironment(env)
    +
    +    val table = env.fromCollection(data).toTable(tEnv, 'long, 'int, 
'string)
    +
    +    val windowedTable = table
    +      .window(Slide over 5.milli every 10.milli on 'long as 'w)
    +      .groupBy('string, 'w)
    +      .select('string, 'int.count, 'w.start, 'w.end)
    +
    +    val expected =
    +      "Hallo,1,1970-01-01 00:00:00.0,1970-01-01 00:00:00.005\n" +
    --- End diff --
    
    There are two result rows for the same window (Hallo, 00:00:00.0, 
00:00:00.005). Is this correct?


> Add sliding group-windows for batch tables
> ------------------------------------------
>
>                 Key: FLINK-5047
>                 URL: https://issues.apache.org/jira/browse/FLINK-5047
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API & SQL
>            Reporter: Jark Wu
>            Assignee: Timo Walther
>
> Add Slide group-windows for batch tables as described in 
> [FLIP-11|https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations].
> There are two ways to implement sliding windows for batch:
> 1. replicate the output in order to assign keys for overlapping windows. This 
> is probably the more straight-forward implementation and supports any 
> aggregation function but blows up the data volume.
> 2. if the aggregation functions are combinable / pre-aggregatable, we can 
> also find the largest tumbling window size from which the sliding windows can 
> be assembled. This is basically the technique used to express sliding windows 
> with plain SQL (GROUP BY + OVER clauses). For a sliding window Slide(10 
> minutes, 2 minutes) this would mean to first compute aggregates of 
> non-overlapping (tumbling) 2 minute windows and assembling consecutively 5 of 
> these into a sliding window (could be done in a MapPartition with sorted 
> input). The implementation could be done as an optimizer rule to split the 
> sliding aggregate into a tumbling aggregate and a SQL WINDOW operator. Maybe 
> it makes sense to implement the WINDOW clause first and reuse this for 
> sliding windows.
> 3. There is also a third, hybrid solution: Doing the pre-aggregation on the 
> largest non-overlapping windows (as in 2) and replicating these results and 
> processing those as in the 1) approach. The benefits of this is that it a) is 
> based on the implementation that supports non-combinable aggregates (which is 
> required in any case) and b) that it does not require the implementation of 
> the SQL WINDOW operator. Internally, this can be implemented again as an 
> optimizer rule that translates the SlidingWindow into a pre-aggregating 
> TublingWindow and a final SlidingWindow (with replication).
> see FLINK-4692 for more discussion



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5047) Add sliding group-windows for batch tables

Reply via email to