UBarney commented on code in PR #15954: URL: https://github.com/apache/datafusion/pull/15954#discussion_r2074906893
########## datafusion/physical-plan/src/aggregates/mod.rs: ########## @@ -751,28 +771,16 @@ impl AggregateExec { }) } _ => { - // When the input row count is 0 or 1, we can adopt that statistic keeping its reliability. + // When the input row count is 1, we can adopt that statistic keeping its reliability. // When it is larger than 1, we degrade the precision since it may decrease after aggregation. - let num_rows = if let Some(value) = self - .input() - .partition_statistics(None)? - .num_rows - .get_value() + let num_rows = if let Some(value) = child_statistics.num_rows.get_value() { - if *value > 1 { - self.input() - .partition_statistics(None)? - .num_rows - .to_inexact() - } else if *value == 0 { - // Aggregation on an empty table creates a null row. Review Comment: * If `!group_by_expr.is_empty()` and `input_statistics.num_rows == 0`: * Both `Partial` and `Final` aggregation modes (`agg.mode`) yield 0 output rows. ((Note the AggregateExec metric: `[output_rows=0]`) ``` > explain analyze select count(*) from generate_series(0) where value > 10 group by value; +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan with Metrics | ProjectionExec: expr=[count(Int64(1))@1 as count(*)], metrics=[output_rows=0, elapsed_compute=24ns] | | | AggregateExec: mode=FinalPartitioned, gby=[value@0 as value], aggr=[count(Int64(1))], metrics=[output_rows=0, elapsed_compute=100.016µs, spill_count=0, spilled_bytes=0, spilled_rows=0, peak_mem_used=1536] | | | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=0, elapsed_compute=452ns] | | | RepartitionExec: partitioning=Hash([value@0], 24), input_partitions=24, metrics=[fetch_time=10.544607ms, repartition_time=24ns, send_time=576ns] | | | AggregateExec: mode=Partial, gby=[value@0 as value], aggr=[count(Int64(1))], metrics=[output_rows=0, elapsed_compute=170.537µs, spill_count=0, spilled_bytes=0, spilled_rows=0, skipped_aggregation_rows=0, peak_mem_used=1536] | | | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=0, elapsed_compute=663ns] | | | FilterExec: value@0 > 10, metrics=[output_rows=0, elapsed_compute=2.201314ms] | | | RepartitionExec: partitioning=RoundRobinBatch(24), input_partitions=1, metrics=[fetch_time=3.077µs, repartition_time=1ns, send_time=1.16µs] | | | LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=0, batch_size=8192], metrics=[] | | | | +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.004 seconds. > select count(*) from generate_series(0) where value > 10 group by value; +----------+ | count(*) | +----------+ +----------+ 0 row(s) fetched. ``` * If `group_by_expr.is_empty()` and `input_statistics.num_rows == 0`: * `Final` aggregation mode (`agg.mode == Final`) yields 1 output row. But it already return never hit this line -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org