andygrove opened a new issue, #1038:
URL: https://github.com/apache/datafusion-comet/issues/1038
### Describe the bug
## SQL
```
SELECT c79, c54, stddev_pop(c73) FROM test1 GROUP BY c79,c54 ORDER BY c79,
c54;
```
c79 is Byte, c54 is either Float or Double
### Spark Plan
```
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
*(3) Sort [c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST], true, 0
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 1
+- Exchange rangepartitioning(c79#279 ASC NULLS FIRST, c54#254 ASC
NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=20613]
+- *(2) HashAggregate(keys=[c79#279, c54#254],
functions=[stddev_pop(c73#273)], output=[c79#279, c54#254,
stddev_pop(c73)#28050])
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 0
+- Exchange hashpartitioning(c79#279, c54#254, 200),
ENSURE_REQUIREMENTS, [plan_id=20585]
+- *(1) HashAggregate(keys=[c79#279,
knownfloatingpointnormalized(normalizenanandzero(c54#254)) AS c54#254],
functions=[partial_stddev_pop(c73#273)], output=[c79#279, c54#254, n#28038,
avg#28039, m2#28040])
+- *(1) ColumnarToRow
+- FileScan parquet [c54#254,c73#273,c79#279]
Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1
paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet],
PartitionFilters: [], PushedFilters: [], ReadSchema:
struct<c54:float,c73:double,c79:tinyint>
+- == Initial Plan ==
Sort [c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(c79#279 ASC NULLS FIRST, c54#254 ASC NULLS
FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=20567]
+- HashAggregate(keys=[c79#279, c54#254],
functions=[stddev_pop(c73#273)], output=[c79#279, c54#254,
stddev_pop(c73)#28050])
+- Exchange hashpartitioning(c79#279, c54#254, 200),
ENSURE_REQUIREMENTS, [plan_id=20564]
+- HashAggregate(keys=[c79#279,
knownfloatingpointnormalized(normalizenanandzero(c54#254)) AS c54#254],
functions=[partial_stddev_pop(c73#273)], output=[c79#279, c54#254, n#28038,
avg#28039, m2#28040])
+- FileScan parquet [c54#254,c73#273,c79#279] Batched: true,
DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1
paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet],
PartitionFilters: [], PushedFilters: [], ReadSchema:
struct<c54:float,c73:double,c79:tinyint>
```
### Comet Plan
```
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
*(1) ColumnarToRow
+- CometSort [c79#279, c54#254, stddev_pop(c73)#28131], [c79#279 ASC
NULLS FIRST, c54#254 ASC NULLS FIRST]
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 1
+- CometColumnarExchange rangepartitioning(c79#279 ASC NULLS
FIRST, c54#254 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS,
CometColumnarShuffle, [plan_id=20746]
+- !CometHashAggregate [c79#279, c54#254, n#28119, avg#28120,
m2#28121], Final, [c79#279, c54#254], [stddev_pop(c73#273)]
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 0
+- CometExchange hashpartitioning(c79#279, c54#254,
200), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=20701]
+- !CometHashAggregate [c54#254, c73#273,
c79#279], Partial, [c79#279,
knownfloatingpointnormalized(normalizenanandzero(c54#254)) AS c54#254],
[partial_stddev_pop(c73#273)]
+- CometScan parquet [c54#254,c73#273,c79#279]
Batched: true, DataFilters: [], Format: CometParquet, Location:
InMemoryFileIndex(1
paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet],
PartitionFilters: [], PushedFilters: [], ReadSchema:
struct<c54:float,c73:double,c79:tinyint>
+- == Initial Plan ==
CometSort [c79#279, c54#254, stddev_pop(c73)#28131], [c79#279 ASC NULLS
FIRST, c54#254 ASC NULLS FIRST]
+- CometColumnarExchange rangepartitioning(c79#279 ASC NULLS FIRST,
c54#254 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle,
[plan_id=20682]
+- !CometHashAggregate [c79#279, c54#254, n#28119, avg#28120,
m2#28121], Final, [c79#279, c54#254], [stddev_pop(c73#273)]
+- CometExchange hashpartitioning(c79#279, c54#254, 200),
ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=20680]
+- !CometHashAggregate [c54#254, c73#273, c79#279], Partial,
[c79#279, knownfloatingpointnormalized(normalizenanandzero(c54#254)) AS
c54#254], [partial_stddev_pop(c73#273)]
+- CometScan parquet [c54#254,c73#273,c79#279] Batched: true,
DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1
paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet],
PartitionFilters: [], PushedFilters: [], ReadSchema:
struct<c54:float,c73:double,c79:tinyint>
```
First difference at row 4:
Spark: `-127,0.31308997,NaN`
Comet: `-127,0.31308997,0.0`
### Steps to reproduce
_No response_
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]