[ 
https://issues.apache.org/jira/browse/SPARK-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556588#comment-14556588
 ] 

Herman van Hovell tot Westerflier commented on SPARK-4233:
----------------------------------------------------------

Hi, 

I have looked through the code in the PR. The new interface doesn't look 
simpler to me. It seems that it has been design with Hive UDAFs in mind.

Can you explain to me why the current UDAF implementation is complicated, why 
it needs to change, and what is improved if we start to use the proposed 
implementation?

As for the distinct implementations. Why not nest the required aggregation 
operator in a distinct operator? For instance:
{code}
case class DistinctifyFunction(
    @transient expr: Seq[Expression],
    @transient aggr: AggregateFunction
    @transient base: AggregateExpression)
  extends AggregateFunction {

  def this() = this(null, null) // Required for serialization.

  val seen = new OpenHashSet[Any]()

  @transient
  val distinctValue = new InterpretedProjection(expr)

  override def update(input: Row): Unit = {
    val evaluatedExpr = distinctValue(input)
    if (!evaluatedExpr.anyNull) {
      seen.add(evaluatedExpr)
    }
  }

  override def eval(input: Row): Any = {
    // Assume the AggregateFunction input has been rerouted, to the distinct 
value projection.
    seen.foreach(aggr.update(_))
    aggr.eval(input)
  }
}
{code}

> Simplify the Aggregation Function implementation
> ------------------------------------------------
>
>                 Key: SPARK-4233
>                 URL: https://issues.apache.org/jira/browse/SPARK-4233
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Cheng Hao
>
> Currently, the UDAF implementation is quite complicated, and we have to 
> provide distinct & non-distinct version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to