[ 
https://issues.apache.org/jira/browse/FLINK-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901344#comment-15901344
 ] 

ASF GitHub Bot commented on FLINK-5047:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3364#discussion_r104925507
  
    --- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/AggregateUtil.scala
 ---
    @@ -186,6 +200,130 @@ object AggregateUtil {
       }
     
       /**
    +    * Create a 
[[org.apache.flink.api.common.functions.GroupReduceFunction]] that prepares for
    +    * partial aggregates of sliding windows (time and count-windows).
    +    * It requires a prepared input (with intermediate aggregate fields and 
aligned rowtime for
    +    * pre-tumbling in case of time-windows), pre-aggregates (pre-tumbles) 
rows, aligns the
    +    * window-start, and replicates or omits records for different panes of 
a sliding window.
    +    *
    +    * The output of the function contains the grouping keys, the 
intermediate aggregate values of
    +    * all aggregate function and the aligned window start. Window start 
must not be a timestamp,
    +    * but can also be a count value for count-windows.
    +    *
    +    * The output is stored in Row by the following format:
    +    *
    +    * {{{
    +    *                      avg(x) aggOffsetInRow = 2      count(z) 
aggOffsetInRow = 5
    +    *                            |                          |
    +    *                            v                          v
    +    *        
+---------+---------+--------+--------+--------+--------+-------------+
    +    *        |groupKey1|groupKey2|  sum1  | count1 |  sum2  | count2 | 
windowStart |
    +    *        
+---------+---------+--------+--------+--------+--------+-------------+
    +    *                                              ^                 ^
    +    *                                              |                 |
    +    *                                 sum(y) aggOffsetInRow = 4    window 
start for pane mapping
    +    * }}}
    +    *
    +    * NOTE: this function is only used for sliding windows with partial 
aggregates on batch tables.
    +    */
    +  def createDataSetSlideWindowPrepareGroupReduceFunction(
    +      window: LogicalWindow,
    +      namedAggregates: Seq[CalcitePair[AggregateCall, String]],
    +      groupings: Array[Int],
    +      inputType: RelDataType,
    +      isParserCaseSensitive: Boolean)
    +    : RichGroupReduceFunction[Row, Row] = {
    +
    +    val aggregates = transformToAggregateFunctions(
    +      namedAggregates.map(_.getKey),
    +      inputType,
    +      needRetraction = false)._2
    +
    +    val returnType: RowTypeInfo = createDataSetAggregateBufferDataType(
    +      groupings,
    +      aggregates,
    +      inputType,
    +      Some(Array(BasicTypeInfo.LONG_TYPE_INFO)))
    +
    +    window match {
    +      case EventTimeSlidingGroupWindow(_, _, size, slide) if 
isTimeInterval(size.resultType) =>
    +        // sliding time-window
    +        // for partial aggregations
    +        new DataSetSlideTimeWindowAggReduceCombineFunction(
    +          aggregates,
    +          groupings.length,
    +          returnType.getArity - 1,
    +          asLong(size),
    +          asLong(slide),
    +          returnType)
    +
    +      case _ =>
    +        throw new UnsupportedOperationException(s"$window is currently not 
supported on batch.")
    +    }
    +  }
    +
    +  /**
    +    * Create a [[org.apache.flink.api.common.functions.FlatMapFunction]] 
that prepares for
    +    * non-incremental aggregates of sliding windows (time-windows).
    +    *
    +    * It requires a prepared input (with intermediate aggregate fields), 
aligns the
    +    * window-start, and replicates or omits records for different panes of 
a sliding window.
    +    *
    +    * The output of the function contains the grouping keys, the 
intermediate aggregate values of
    +    * all aggregate function and the aligned window start.
    +    *
    +    * The output is stored in Row by the following format:
    +    *
    +    * {{{
    +    *                      avg(x) aggOffsetInRow = 2      count(z) 
aggOffsetInRow = 5
    +    *                            |                          |
    +    *                            v                          v
    +    *        
+---------+---------+--------+--------+--------+--------+-------------+
    +    *        |groupKey1|groupKey2|  sum1  | count1 |  sum2  | count2 | 
windowStart |
    +    *        
+---------+---------+--------+--------+--------+--------+-------------+
    +    *                                              ^                 ^
    +    *                                              |                 |
    +    *                                 sum(y) aggOffsetInRow = 4      
window start for pane mapping
    +    * }}}
    +    *
    +    * NOTE: this function is only used for time-based sliding windows on 
batch tables.
    +    */
    +  def createDataSetSlideWindowPrepareFlatMapFunction(
    +      window: LogicalWindow,
    +      namedAggregates: Seq[CalcitePair[AggregateCall, String]],
    +      groupings: Array[Int],
    +      inputType: RelDataType,
    +      isParserCaseSensitive: Boolean)
    +    : FlatMapFunction[Row, Row] = {
    +
    +    val aggregates = transformToAggregateFunctions(
    +      namedAggregates.map(_.getKey),
    +      inputType,
    +      needRetraction = false)._2
    +
    +    val mapReturnType: RowTypeInfo = createDataSetAggregateBufferDataType(
    --- End diff --
    
    output type should be equal to input type.


> Add sliding group-windows for batch tables
> ------------------------------------------
>
>                 Key: FLINK-5047
>                 URL: https://issues.apache.org/jira/browse/FLINK-5047
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API & SQL
>            Reporter: Jark Wu
>            Assignee: Timo Walther
>
> Add Slide group-windows for batch tables as described in 
> [FLIP-11|https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations].
> There are two ways to implement sliding windows for batch:
> 1. replicate the output in order to assign keys for overlapping windows. This 
> is probably the more straight-forward implementation and supports any 
> aggregation function but blows up the data volume.
> 2. if the aggregation functions are combinable / pre-aggregatable, we can 
> also find the largest tumbling window size from which the sliding windows can 
> be assembled. This is basically the technique used to express sliding windows 
> with plain SQL (GROUP BY + OVER clauses). For a sliding window Slide(10 
> minutes, 2 minutes) this would mean to first compute aggregates of 
> non-overlapping (tumbling) 2 minute windows and assembling consecutively 5 of 
> these into a sliding window (could be done in a MapPartition with sorted 
> input). The implementation could be done as an optimizer rule to split the 
> sliding aggregate into a tumbling aggregate and a SQL WINDOW operator. Maybe 
> it makes sense to implement the WINDOW clause first and reuse this for 
> sliding windows.
> 3. There is also a third, hybrid solution: Doing the pre-aggregation on the 
> largest non-overlapping windows (as in 2) and replicating these results and 
> processing those as in the 1) approach. The benefits of this is that it a) is 
> based on the implementation that supports non-combinable aggregates (which is 
> required in any case) and b) that it does not require the implementation of 
> the SQL WINDOW operator. Internally, this can be implemented again as an 
> optimizer rule that translates the SlidingWindow into a pre-aggregating 
> TublingWindow and a final SlidingWindow (with replication).
> see FLINK-4692 for more discussion



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to