[jira] [Commented] (FLINK-5047) Add sliding group-windows for batch tables

ASF GitHub Bot (JIRA) Mon, 06 Mar 2017 13:42:06 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898185#comment-15898185
 ]


ASF GitHub Bot commented on FLINK-5047:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3364#discussion_r104518276
  
    --- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/DataSetSlideWindowAggReduceCombineFunction.scala
 ---
    @@ -0,0 +1,88 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.table.runtime.aggregate
    +
    +import java.lang.Iterable
    +
    +import org.apache.flink.api.common.functions.CombineFunction
    +import org.apache.flink.types.Row
    +
    +/**
    +  * Wraps the aggregate logic inside of
    +  * [[org.apache.flink.api.java.operators.GroupReduceOperator]] and
    +  * [[org.apache.flink.api.java.operators.GroupCombineOperator]].
    +  *
    +  * It is used for sliding on batch for both time and count-windows.
    +  *
    +  * @param aggregates aggregate functions.
    +  * @param groupKeysMapping index mapping of group keys between 
intermediate aggregate Row
    +  *                         and output Row.
    +  * @param aggregateMapping index mapping between aggregate function list 
and aggregated value
    +  *                         index in output Row.
    +  * @param intermediateRowArity intermediate row field count
    +  * @param finalRowArity output row field count
    +  * @param finalRowWindowStartPos relative window-start position to last 
field of output row
    +  * @param finalRowWindowEndPos relative window-end position to last field 
of output row
    +  * @param windowSize size of the window, used to determine window-end for 
output row
    +  */
    +class DataSetSlideWindowAggReduceCombineFunction(
    +    aggregates: Array[Aggregate[_ <: Any]],
    +    groupKeysMapping: Array[(Int, Int)],
    +    aggregateMapping: Array[(Int, Int)],
    +    intermediateRowArity: Int,
    +    finalRowArity: Int,
    +    finalRowWindowStartPos: Option[Int],
    +    finalRowWindowEndPos: Option[Int],
    +    windowSize: Long)
    +  extends DataSetSlideWindowAggReduceGroupFunction(
    +    aggregates,
    +    groupKeysMapping,
    +    aggregateMapping,
    +    intermediateRowArity,
    +    finalRowArity,
    +    finalRowWindowStartPos,
    +    finalRowWindowEndPos,
    +    windowSize)
    +  with CombineFunction[Row, Row] {
    +
    +  override def combine(records: Iterable[Row]): Row = {
    +    // initiate intermediate aggregate value
    +    aggregates.foreach(_.initiate(aggregateBuffer))
    +
    +    val iterator = records.iterator()
    +    while (iterator.hasNext) {
    +      val record = iterator.next()
    +      aggregates.foreach(_.merge(record, aggregateBuffer))
    +
    +      // check if this record is the last record
    +      if (!iterator.hasNext) {
    +        // set group keys to aggregateBuffer
    +        for (i <- groupKeysMapping.indices) {
    +          aggregateBuffer.setField(i, record.getField(i))
    +        }
    +
    +        aggregateBuffer.setField(windowStartFieldPos, 
record.getField(windowStartFieldPos))
    +
    +        return aggregateBuffer
    +      }
    +    }
    +
    +    // this code path should never be reached as we return before the loop 
finishes
    +    throw new IllegalArgumentException("Group is empty. This should never 
happen.")
    --- End diff --
    
    Can be removed. Combine is only called if there is at least one record. 


> Add sliding group-windows for batch tables
> ------------------------------------------
>
>                 Key: FLINK-5047
>                 URL: https://issues.apache.org/jira/browse/FLINK-5047
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API & SQL
>            Reporter: Jark Wu
>            Assignee: Timo Walther
>
> Add Slide group-windows for batch tables as described in 
> [FLIP-11|https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations].
> There are two ways to implement sliding windows for batch:
> 1. replicate the output in order to assign keys for overlapping windows. This 
> is probably the more straight-forward implementation and supports any 
> aggregation function but blows up the data volume.
> 2. if the aggregation functions are combinable / pre-aggregatable, we can 
> also find the largest tumbling window size from which the sliding windows can 
> be assembled. This is basically the technique used to express sliding windows 
> with plain SQL (GROUP BY + OVER clauses). For a sliding window Slide(10 
> minutes, 2 minutes) this would mean to first compute aggregates of 
> non-overlapping (tumbling) 2 minute windows and assembling consecutively 5 of 
> these into a sliding window (could be done in a MapPartition with sorted 
> input). The implementation could be done as an optimizer rule to split the 
> sliding aggregate into a tumbling aggregate and a SQL WINDOW operator. Maybe 
> it makes sense to implement the WINDOW clause first and reuse this for 
> sliding windows.
> 3. There is also a third, hybrid solution: Doing the pre-aggregation on the 
> largest non-overlapping windows (as in 2) and replicating these results and 
> processing those as in the 1) approach. The benefits of this is that it a) is 
> based on the implementation that supports non-combinable aggregates (which is 
> required in any case) and b) that it does not require the implementation of 
> the SQL WINDOW operator. Internally, this can be implemented again as an 
> optimizer rule that translates the SlidingWindow into a pre-aggregating 
> TublingWindow and a final SlidingWindow (with replication).
> see FLINK-4692 for more discussion



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5047) Add sliding group-windows for batch tables

Reply via email to