liujiayi771 commented on code in PR #5216:
URL: https://github.com/apache/incubator-gluten/pull/5216#discussion_r1547170167
##########
backends-velox/src/main/scala/org/apache/gluten/execution/HashAggregateExecTransformer.scala:
##########
@@ -146,7 +146,7 @@ abstract class HashAggregateExecTransformer(
val (sparkOrders, sparkTypes) =
aggFunc.aggBufferAttributes.map(attr => (attr.name,
attr.dataType)).unzip
val veloxOrders =
VeloxIntermediateData.veloxIntermediateDataOrder(aggFunc)
- val adjustedOrders = sparkOrders.map(veloxOrders.indexOf(_))
+ val adjustedOrders =
sparkOrders.map(VeloxIntermediateData.getAttrIndex(veloxOrders, _))
Review Comment:
This change is to support another situation.
> Agg functions with inconsistent ordering of intermediate data between
Velox and Spark. The
strings in the Seq comes from the aggBufferAttributes of Spark's
aggregate function, and they
are arranged in the order of fields in Velox's Accumulator. The reason
for using a
two-dimensional Seq is that in some cases, a field in Velox will be
mapped to multiple
Attributes in Spark's aggBufferAttributes. For example, the fourth field
of Velox's RegrSlope
Accumulator is mapped to both xAvg and avg in Spark's RegrSlope
aggBufferAttributes. In this
scenario, when passing the output of Spark's partial aggregation to
Velox, we only need to
take one of them.
`VeloxIntermediateData.getAttrIndex` is used to get the index from a
two-dimensional Seq.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]