[
https://issues.apache.org/jira/browse/FLINK-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898187#comment-15898187
]
ASF GitHub Bot commented on FLINK-5047:
---------------------------------------
Github user fhueske commented on a diff in the pull request:
https://github.com/apache/flink/pull/3364#discussion_r104524635
--- Diff:
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/DataSetSlideWindowAggReduceGroupFunction.scala
---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import java.lang.Iterable
+
+import org.apache.flink.api.common.functions.RichGroupReduceFunction
+import org.apache.flink.configuration.Configuration
+import org.apache.flink.types.Row
+import org.apache.flink.util.{Collector, Preconditions}
+
+/**
+ * It wraps the aggregate logic inside of
+ * [[org.apache.flink.api.java.operators.GroupReduceOperator]].
+ *
+ * It is used for sliding on batch for both time and count-windows.
+ *
+ * @param aggregates aggregate functions.
+ * @param groupKeysMapping index mapping of group keys between
intermediate aggregate Row
+ * and output Row.
+ * @param aggregateMapping index mapping between aggregate function list
and aggregated value
+ * index in output Row.
+ * @param intermediateRowArity intermediate row field count
+ * @param finalRowArity output row field count
+ * @param finalRowWindowStartPos relative window-start position to last
field of output row
+ * @param finalRowWindowEndPos relative window-end position to last field
of output row
+ * @param windowSize size of the window, used to determine window-end for
output row
+ */
+class DataSetSlideWindowAggReduceGroupFunction(
+ aggregates: Array[Aggregate[_ <: Any]],
+ groupKeysMapping: Array[(Int, Int)],
+ aggregateMapping: Array[(Int, Int)],
+ intermediateRowArity: Int,
+ finalRowArity: Int,
+ finalRowWindowStartPos: Option[Int],
+ finalRowWindowEndPos: Option[Int],
+ windowSize: Long)
+ extends RichGroupReduceFunction[Row, Row] {
+
+ protected var aggregateBuffer: Row = _
+ protected var windowStartFieldPos: Int = _
+
+ private var collector: TimeWindowPropertyCollector = _
+ private var output: Row = _
+
+ override def open(config: Configuration) {
+ Preconditions.checkNotNull(aggregates)
--- End diff --
Except for the initialization of `TimeWindowPropertyCollector` everything
can be moved into the constructor.
> Add sliding group-windows for batch tables
> ------------------------------------------
>
> Key: FLINK-5047
> URL: https://issues.apache.org/jira/browse/FLINK-5047
> Project: Flink
> Issue Type: Sub-task
> Components: Table API & SQL
> Reporter: Jark Wu
> Assignee: Timo Walther
>
> Add Slide group-windows for batch tables as described in
> [FLIP-11|https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations].
> There are two ways to implement sliding windows for batch:
> 1. replicate the output in order to assign keys for overlapping windows. This
> is probably the more straight-forward implementation and supports any
> aggregation function but blows up the data volume.
> 2. if the aggregation functions are combinable / pre-aggregatable, we can
> also find the largest tumbling window size from which the sliding windows can
> be assembled. This is basically the technique used to express sliding windows
> with plain SQL (GROUP BY + OVER clauses). For a sliding window Slide(10
> minutes, 2 minutes) this would mean to first compute aggregates of
> non-overlapping (tumbling) 2 minute windows and assembling consecutively 5 of
> these into a sliding window (could be done in a MapPartition with sorted
> input). The implementation could be done as an optimizer rule to split the
> sliding aggregate into a tumbling aggregate and a SQL WINDOW operator. Maybe
> it makes sense to implement the WINDOW clause first and reuse this for
> sliding windows.
> 3. There is also a third, hybrid solution: Doing the pre-aggregation on the
> largest non-overlapping windows (as in 2) and replicating these results and
> processing those as in the 1) approach. The benefits of this is that it a) is
> based on the implementation that supports non-combinable aggregates (which is
> required in any case) and b) that it does not require the implementation of
> the SQL WINDOW operator. Internally, this can be implemented again as an
> optimizer rule that translates the SlidingWindow into a pre-aggregating
> TublingWindow and a final SlidingWindow (with replication).
> see FLINK-4692 for more discussion
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)