liujiayi771 commented on code in PR #5754:
URL: https://github.com/apache/incubator-gluten/pull/5754#discussion_r1618062532
##########
backends-velox/src/main/scala/org/apache/gluten/execution/HashAggregateExecTransformer.scala:
##########
@@ -241,21 +226,21 @@ abstract class HashAggregateExecTransformer(
}
aggregateFunction match {
- case hllAdapter: HLLAdapter =>
+ case _ if aggregateFunction.aggBufferAttributes.size > 1 =>
+ generateMergeCompanionNode()
+ case _ =>
aggregateMode match {
- case Partial =>
- // For Partial mode output type is binary.
+ case Partial | PartialMerge =>
val partialNode = ExpressionBuilder.makeAggregateFunction(
VeloxAggregateFunctionsBuilder.create(args, aggregateFunction,
aggregateMode),
childrenNodeList,
modeKeyWord,
ConverterUtils.getTypeNode(
- hllAdapter.inputAggBufferAttributes.head.dataType,
- hllAdapter.inputAggBufferAttributes.head.nullable)
+ aggregateFunction.inputAggBufferAttributes.head.dataType,
Review Comment:
@zhli1142015 Gluten searches agg functions in Velox based on the raw input
and final output signatures of the agg function. I think using
`inputAggBufferAttributes.head` to construct the `aggFuncNode` for partial agg
is inaccurate. It would be more appropriate to follow the original `dataType`.
I searched through the Spark code and found that, for most agg functions,
the `dataType` and `inputAggBufferAttributes.head` are the same. However, for
decimal avg, the `inputAggBufferAttributes.head` is sum `DecimalType`, and
while the final output `resultType` is also `DecimalType`, the precision and
scale are different.
Even though Velox does not consider precision and scale when searching agg
functions, there might be other agg function types that are completely
different. For avg, if `inputAggBufferAttributes` contains count before sum,
then the two are entirely inconsistent.
Therefore, I believe using `dataType` is more appropriate as it aligns with
most agg functions.
cc @zhztheplayer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]