asolimando opened a new pull request, #21050:
URL: https://github.com/apache/datafusion/pull/21050

   ## Which issue does this PR close?
   
   Fixes CI breakage on `main` introduced by #19957.
   
   ## Rationale for this change
   
   #19957 introduced NDV extraction from Parquet metadata. The optimizer now 
sees NDV=1 for `HitColor`, `BrowserCountry`, `BrowserLanguage` in the 
clickbench test file and short-circuits `COUNT(DISTINCT)` to a constant 
projection, skipping the full table scan.
   
   ## What changes are included in this PR?
   
   Updates the expected EXPLAIN plan in `clickbench.slt` to match the new 
(better) physical plan:
   
   ```diff
   -   01)AggregateExec: mode=Single, gby=[], aggr=[count(DISTINCT 
hits.HitColor), ...]
   -   02)--DataSourceExec: file_groups={1 group: [...]}, projection=[HitColor, 
BrowserLanguage, BrowserCountry], file_type=parquet
   +   01)ProjectionExec: expr=[1 as count(DISTINCT hits.HitColor), 1 as 
count(DISTINCT hits.BrowserCountry), 1 as count(DISTINCT hits.BrowserLanguage)]
   +   02)--PlaceholderRowExec
   ```
   
   ## Are these changes tested?
   
   This PR *is* the test fix. Verified locally with `cargo test --profile ci -p 
datafusion-sqllogictest --test sqllogictests`.
   
   ## Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to