[jira] [Commented] (FLINK-3474) Partial aggregate interface design and sort-based implementation

ASF GitHub Bot (JIRA) Wed, 02 Mar 2016 10:01:43 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176104#comment-15176104
 ]


ASF GitHub Bot commented on FLINK-3474:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1746#discussion_r54763270
  
    --- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/runtime/aggregate/AggregateReduceGroupFunction.scala
 ---
    @@ -0,0 +1,198 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.api.table.runtime.aggregate
    +
    +import java.lang.Iterable
    +
    +import com.google.common.base.Preconditions
    +import org.apache.flink.api.common.functions.{CombineFunction, 
RichGroupReduceFunction, RichMapPartitionFunction}
    +import org.apache.flink.api.table.Row
    +import org.apache.flink.configuration.Configuration
    +import org.apache.flink.util.Collector
    +
    +import scala.collection.JavaConversions._
    +
    +/**
    + * It wraps the aggregate logic inside of 
    + * [[org.apache.flink.api.java.operators.GroupReduceOperator]].
    + *
    + * @param aggregates   The aggregate functions.
    + * @param groupKeysMapping The index mapping of group keys between 
intermediate aggregate Row 
    + *                         and output Row.
    + * @param aggregateMapping The index mapping between aggregate function 
list and aggregated value
    + *                         index in output Row.
    + */
    +class AggregateReduceGroupFunction(
    +    private val aggregates: Array[Aggregate[_ <: Any]],
    +    private val groupKeysMapping: Array[(Int, Int)],
    +    private val aggregateMapping: Array[(Int, Int)],
    +    private val intermediateRowArity: Int)
    +    extends RichGroupReduceFunction[Row, Row] {
    +
    +  private val finalRowLength: Int = groupKeysMapping.length + 
aggregateMapping.length
    +  private var aggregateBuffer: Row = _
    +  private var output: Row = _
    +
    +  override def open(config: Configuration) {
    +    Preconditions.checkNotNull(aggregates)
    +    Preconditions.checkNotNull(groupKeysMapping)
    +    aggregateBuffer = new Row(intermediateRowArity)
    +    output = new Row(finalRowLength)
    +  }
    +
    +  /**
    +   * For grouped intermediate aggregate Rows, merge all of them into 
aggregate buffer,
    +   * calculate aggregated values output by aggregate buffer, and set them 
into output 
    +   * Row based on the mapping relation between intermediate aggregate data 
and output data.
    +   *
    +   * @param records  Grouped intermediate aggregate Rows iterator.
    +   * @param out The collector to hand results to.
    +   *
    +   */
    +  override def reduce(records: Iterable[Row], out: Collector[Row]): Unit = 
{
    +
    +    // Initiate intermediate aggregate value.
    +    aggregates.foreach(_.initiate(aggregateBuffer))
    +
    +    // Merge intermediate aggregate value to buffer.
    +    var last: Row = null
    +    records.foreach((record) =>  {
    +      aggregates.foreach(_.merge(record, aggregateBuffer))
    +      last = record
    +    })
    +
    +    // Set group keys to aggregateBuffer.
    +    for (i <- 0 until groupKeysMapping.length) {
    --- End diff --
    
    Is this necessary? Looks like we copy the keys first into the 
`aggregateBuffer` and then into the `output` row. Can't we copy the keys 
directly from `last` to `output`?


> Partial aggregate interface design and sort-based implementation
> ----------------------------------------------------------------
>
>                 Key: FLINK-3474
>                 URL: https://issues.apache.org/jira/browse/FLINK-3474
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>
> The scope of this sub task includes:
> # Partial aggregate interface.
> # Simple aggregate function implementation, such as SUM/AVG/COUNT/MIN/MAX.
> # DataSetAggregateRule which translate logical calcite aggregate node to 
> Flink user functions. As hash-based combiner is not available yet(see PR 
> #1517), we would use sort-based combine as default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3474) Partial aggregate interface design and sort-based implementation

Reply via email to