Hi everyone, I was playing with the integration of Hive UDAFs in Spark-SQL and noticed that the terminatePartial and merge methods of custom UDAFs were not called. This made me curious as those two methods are the ones responsible for distributing the UDAF execution in Hive. Looking at the code of HiveUdafFunction which seems to be the wrapper for all native Hive functions for which there exists no spark-sql specific implementation, I noticed that it
a) extends AggregateFunction and not PartialAggregate b) only contains calls to iterate and evaluate, but never to merge of the underlying UDAFEvaluator object My question is thus twofold: Is my observation correct, that to achieve distributed execution of a UDAF I have to add a custom implementation at the spark-sql layer (like the examples in aggregates.scala)? If that is the case, how difficult would it be to use the terminatePartial and merge functions provided by the UDAFEvaluator to make Hive UDAFs distributed by default? Cheers, Daniel -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/In-Spark-SQL-is-there-support-for-distributed-execution-of-native-Hive-UDAFs-tp11753.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.