alamb commented on code in PR #21681: URL: https://github.com/apache/datafusion/pull/21681#discussion_r3095283572
########## datafusion/sqllogictest/test_files/aggregate.slt: ########## @@ -8951,3 +8951,33 @@ GROUP BY id ORDER BY id; statement ok DROP TABLE first_last_value_str_tests; + +# Regression test for incorrect MIN/MAX folding from projected expression +# statistics. The PR branch `aggregate-stats-single-mode-and-cast` rewrites this +# query to unattainable literals using parquet min/max envelopes for UserID and +# ClientIP. +statement ok +SET datafusion.execution.target_partitions = 1; + +statement ok +CREATE EXTERNAL TABLE hits_raw +STORED AS PARQUET +LOCATION '../core/tests/data/clickbench_hits_10.parquet'; + +query II Review Comment: This is a test for a a query that is optimized to used statistics in https://github.com/apache/datafusion/pull/21651/changes On main, DataFusion computes delta from the actual rows and then takes the real min/max, On https://github.com/apache/datafusion/pull/21651, the new logic propagates exact column min/max through the projection using interval arithmetic, then aggregate_statistics treats those derived bounds as exact aggregate answers and replaces the whole aggregate with literals. For UserID - ClientIP, the interval formed from independent column extrema is wider than the set of values that actually occur in the data, because the min UserID is not paired with the max ClientIP in the same row, and similarly for the max side. The optimizer therefore folds to unattainable values: - wrong folded min: -2461439047704734435 - wrong folded max: 7418527521343057109 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
