alamb commented on code in PR #21681:
URL: https://github.com/apache/datafusion/pull/21681#discussion_r3095283572


##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -8951,3 +8951,33 @@ GROUP BY id ORDER BY id;
 
 statement ok
 DROP TABLE first_last_value_str_tests;
+
+# Regression test for incorrect MIN/MAX folding from projected expression
+# statistics. The PR branch `aggregate-stats-single-mode-and-cast` rewrites 
this
+# query to unattainable literals using parquet min/max envelopes for UserID and
+# ClientIP.
+statement ok
+SET datafusion.execution.target_partitions = 1;
+
+statement ok
+CREATE EXTERNAL TABLE hits_raw
+STORED AS PARQUET
+LOCATION '../core/tests/data/clickbench_hits_10.parquet';
+
+query II

Review Comment:
   This is a test for a a query that is optimized to used statistics in 
https://github.com/apache/datafusion/pull/21651/changes
   
   
   On main, DataFusion computes delta from the actual rows and then takes the 
real min/max,
   
   On https://github.com/apache/datafusion/pull/21651, the new logic propagates 
exact column min/max through the projection using interval arithmetic, then 
aggregate_statistics treats those derived bounds as exact aggregate answers and 
replaces the whole aggregate with literals.
   
    For UserID - ClientIP, the interval formed from independent column extrema 
is wider than the set of values that actually occur in the data, because the 
min UserID is not paired with the max ClientIP in the same row, and similarly 
for the max side. The optimizer therefore folds to unattainable values:
   
     - wrong folded min: -2461439047704734435
     - wrong folded max: 7418527521343057109
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to