[
https://issues.apache.org/jira/browse/SPARK-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639921#comment-14639921
]
Yin Huai commented on SPARK-4366:
---------------------------------
Here is a brief instruction on how to implement a built-in aggregate function
that supports code-gen.
For our new aggregate function interface, {{AlgebraicAggregate}} is the
abstract class used for all built-in aggregate functions that support code-gen.
Functions based on {{AlgebraicAggregate}} uses our existing expressions to
implement operations like initializing aggregation buffer values, updating
buffer, merging two buffers, and evaluating results. A good example is
{{org.apache.spark.sql.catalyst.expressions.aggregate.Average}}. Since all
operations of an {{AlgebraicAggregate}} are built on top of our expression
system, the developer does not need to do anything special to support code-gen.
It will just work out of the box. For those built-in functions that are hard to
be expressed by our expressions, {{AggregateFunction2}} is the abstract class
to use.
For descriptions of aggregate functions, here are some references:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inAggregateFunctions(UDAF)
https://prestodb.io/docs/current/functions/aggregate.html
https://msdn.microsoft.com/en-us/library/ms173454.aspx
http://www.postgresql.org/docs/devel/static/functions-aggregate.html
> Aggregation Improvement
> -----------------------
>
> Key: SPARK-4366
> URL: https://issues.apache.org/jira/browse/SPARK-4366
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Cheng Hao
> Priority: Critical
> Attachments: aggregatefunction_v1.pdf
>
>
> This improvement actually includes couple of sub tasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]