jayzhan211 commented on code in PR #13564:
URL: https://github.com/apache/datafusion/pull/13564#discussion_r1865778983
##########
datafusion/core/tests/fuzz_cases/aggregation_fuzzer/data_generator.rs:
##########
@@ -87,7 +87,12 @@ impl DatasetGeneratorConfig {
.iter()
.filter_map(|d| {
if d.column_type.is_numeric()
- && !matches!(d.column_type, DataType::Float32 |
DataType::Float64)
+ && !matches!(
Review Comment:
Decimal128(38, 3) for normal precision, Decimal128(30, 3) for grouping. Not
sure why there is mismatch in fuzz test. We should either align the precision
for both cases or fix the fuzz schema check if they are not necessary to have
the same precision like slt
```
query TT
select arrow_typeof(sum(column1)), arrow_typeof(sum(distinct column1)) from
t group by column2;
----
Decimal128(38, 3) Decimal128(30, 3)
Decimal128(38, 3) Decimal128(30, 3)
query TT
explain select sum(column1), sum(distinct column1) from t group by column2;
----
logical_plan
01)Projection: sum(alias2) AS sum(t.column1), sum(alias1) AS sum(DISTINCT
t.column1)
02)--Aggregate: groupBy=[[t.column2]], aggr=[[sum(alias2), sum(alias1)]]
03)----Aggregate: groupBy=[[t.column2, t.column1 AS alias1]],
aggr=[[sum(t.column1) AS alias2]]
04)------TableScan: t projection=[column1, column2]
physical_plan
01)ProjectionExec: expr=[sum(alias2)@1 as sum(t.column1), sum(alias1)@2 as
sum(DISTINCT t.column1)]
02)--AggregateExec: mode=FinalPartitioned, gby=[column2@0 as column2],
aggr=[sum(alias2), sum(alias1)]
03)----CoalesceBatchesExec: target_batch_size=8192
04)------RepartitionExec: partitioning=Hash([column2@0], 4),
input_partitions=4
05)--------AggregateExec: mode=Partial, gby=[column2@0 as column2],
aggr=[sum(alias2), sum(alias1)]
06)----------AggregateExec: mode=FinalPartitioned, gby=[column2@0 as
column2, alias1@1 as alias1], aggr=[alias2]
07)------------CoalesceBatchesExec: target_batch_size=8192
08)--------------RepartitionExec: partitioning=Hash([column2@0, alias1@1],
4), input_partitions=4
09)----------------RepartitionExec: partitioning=RoundRobinBatch(4),
input_partitions=1
10)------------------AggregateExec: mode=Partial, gby=[column2@1 as column2,
column1@0 as alias1], aggr=[alias2]
11)--------------------MemoryExec: partitions=1, partition_sizes=[1]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]