Re: [PR] feat: Add GroupColumn `Decimal128Array` [datafusion]

via GitHub Mon, 02 Dec 2024 04:40:43 -0800


jayzhan211 commented on code in PR #13564:
URL: https://github.com/apache/datafusion/pull/13564#discussion_r1865778983



##########
datafusion/core/tests/fuzz_cases/aggregation_fuzzer/data_generator.rs:
##########
@@ -87,7 +87,12 @@ impl DatasetGeneratorConfig {
             .iter()
             .filter_map(|d| {
                 if d.column_type.is_numeric()
-                    && !matches!(d.column_type, DataType::Float32 | 
DataType::Float64)
+                    && !matches!(

Review Comment:
   
   Decimal128(38, 3) for normal precision, Decimal128(30, 3) for grouping. Not 
sure why there is mismatch in fuzz test. We should either align the precision 
for both cases or fix the fuzz schema check if they are not necessary to have 
the same precision like slt
   
   ```
   query TT
   select arrow_typeof(sum(column1)), arrow_typeof(sum(distinct column1)) from 
t group by column2;
   ----
   Decimal128(38, 3) Decimal128(30, 3)
   Decimal128(38, 3) Decimal128(30, 3)
   
   query TT
   explain select sum(column1), sum(distinct column1) from t group by column2;
   ----
   logical_plan
   01)Projection: sum(alias2) AS sum(t.column1), sum(alias1) AS sum(DISTINCT 
t.column1)
   02)--Aggregate: groupBy=[[t.column2]], aggr=[[sum(alias2), sum(alias1)]]
   03)----Aggregate: groupBy=[[t.column2, t.column1 AS alias1]], 
aggr=[[sum(t.column1) AS alias2]]
   04)------TableScan: t projection=[column1, column2]
   physical_plan
   01)ProjectionExec: expr=[sum(alias2)@1 as sum(t.column1), sum(alias1)@2 as 
sum(DISTINCT t.column1)]
   02)--AggregateExec: mode=FinalPartitioned, gby=[column2@0 as column2], 
aggr=[sum(alias2), sum(alias1)]
   03)----CoalesceBatchesExec: target_batch_size=8192
   04)------RepartitionExec: partitioning=Hash([column2@0], 4), 
input_partitions=4
   05)--------AggregateExec: mode=Partial, gby=[column2@0 as column2], 
aggr=[sum(alias2), sum(alias1)]
   06)----------AggregateExec: mode=FinalPartitioned, gby=[column2@0 as 
column2, alias1@1 as alias1], aggr=[alias2]
   07)------------CoalesceBatchesExec: target_batch_size=8192
   08)--------------RepartitionExec: partitioning=Hash([column2@0, alias1@1], 
4), input_partitions=4
   09)----------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
   10)------------------AggregateExec: mode=Partial, gby=[column2@1 as column2, 
column1@0 as alias1], aggr=[alias2]
   11)--------------------MemoryExec: partitions=1, partition_sizes=[1]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Add GroupColumn `Decimal128Array` [datafusion]

Reply via email to