[ https://issues.apache.org/jira/browse/FLINK-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176104#comment-15176104 ]
ASF GitHub Bot commented on FLINK-3474: --------------------------------------- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1746#discussion_r54763270 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/runtime/aggregate/AggregateReduceGroupFunction.scala --- @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.api.table.runtime.aggregate + +import java.lang.Iterable + +import com.google.common.base.Preconditions +import org.apache.flink.api.common.functions.{CombineFunction, RichGroupReduceFunction, RichMapPartitionFunction} +import org.apache.flink.api.table.Row +import org.apache.flink.configuration.Configuration +import org.apache.flink.util.Collector + +import scala.collection.JavaConversions._ + +/** + * It wraps the aggregate logic inside of + * [[org.apache.flink.api.java.operators.GroupReduceOperator]]. + * + * @param aggregates The aggregate functions. + * @param groupKeysMapping The index mapping of group keys between intermediate aggregate Row + * and output Row. + * @param aggregateMapping The index mapping between aggregate function list and aggregated value + * index in output Row. + */ +class AggregateReduceGroupFunction( + private val aggregates: Array[Aggregate[_ <: Any]], + private val groupKeysMapping: Array[(Int, Int)], + private val aggregateMapping: Array[(Int, Int)], + private val intermediateRowArity: Int) + extends RichGroupReduceFunction[Row, Row] { + + private val finalRowLength: Int = groupKeysMapping.length + aggregateMapping.length + private var aggregateBuffer: Row = _ + private var output: Row = _ + + override def open(config: Configuration) { + Preconditions.checkNotNull(aggregates) + Preconditions.checkNotNull(groupKeysMapping) + aggregateBuffer = new Row(intermediateRowArity) + output = new Row(finalRowLength) + } + + /** + * For grouped intermediate aggregate Rows, merge all of them into aggregate buffer, + * calculate aggregated values output by aggregate buffer, and set them into output + * Row based on the mapping relation between intermediate aggregate data and output data. + * + * @param records Grouped intermediate aggregate Rows iterator. + * @param out The collector to hand results to. + * + */ + override def reduce(records: Iterable[Row], out: Collector[Row]): Unit = { + + // Initiate intermediate aggregate value. + aggregates.foreach(_.initiate(aggregateBuffer)) + + // Merge intermediate aggregate value to buffer. + var last: Row = null + records.foreach((record) => { + aggregates.foreach(_.merge(record, aggregateBuffer)) + last = record + }) + + // Set group keys to aggregateBuffer. + for (i <- 0 until groupKeysMapping.length) { --- End diff -- Is this necessary? Looks like we copy the keys first into the `aggregateBuffer` and then into the `output` row. Can't we copy the keys directly from `last` to `output`? > Partial aggregate interface design and sort-based implementation > ---------------------------------------------------------------- > > Key: FLINK-3474 > URL: https://issues.apache.org/jira/browse/FLINK-3474 > Project: Flink > Issue Type: Sub-task > Components: Table API > Reporter: Chengxiang Li > Assignee: Chengxiang Li > > The scope of this sub task includes: > # Partial aggregate interface. > # Simple aggregate function implementation, such as SUM/AVG/COUNT/MIN/MAX. > # DataSetAggregateRule which translate logical calcite aggregate node to > Flink user functions. As hash-based combiner is not available yet(see PR > #1517), we would use sort-based combine as default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)