korowa commented on code in PR #12438:
URL: https://github.com/apache/datafusion/pull/12438#discussion_r1755454746


##########
benchmarks/queries/clickbench/extended.sql:
##########
@@ -2,3 +2,4 @@ SELECT COUNT(DISTINCT "SearchPhrase"), COUNT(DISTINCT 
"MobilePhone"), COUNT(DIST
 SELECT COUNT(DISTINCT "HitColor"), COUNT(DISTINCT "BrowserCountry"), 
COUNT(DISTINCT "BrowserLanguage")  FROM hits;
 SELECT "BrowserCountry",  COUNT(DISTINCT "SocialNetwork"), COUNT(DISTINCT 
"HitColor"), COUNT(DISTINCT "BrowserLanguage"), COUNT(DISTINCT "SocialAction") 
FROM hits GROUP BY 1 ORDER BY 2 DESC LIMIT 10;
 SELECT "SocialSourceNetworkID", "RegionID", COUNT(*), AVG("Age"), 
AVG("ParamPrice"), STDDEV("ParamPrice") as s, VAR("ParamPrice")  FROM hits 
GROUP BY "SocialSourceNetworkID", "RegionID" HAVING s IS NOT NULL ORDER BY s 
DESC LIMIT 10;
+SELECT MIN("ResponseStartTiming") tmin, MEDIAN("ResponseStartTiming") tmed, 
approx_percentile_cont("ResponseStartTiming", 0.95) tp95, 
approx_percentile_cont("ResponseStartTiming", 0.95) tp99, 
MAX("ResponseStartTiming") tmax,  "UserID" FROM hits GROUP BY "UserID" HAVING 
tmin > 0 AND tmed > 0 ORDER BY tp95 DESC LIMIT 10;

Review Comment:
   Since by default partial aggregation is skipped in case `unique groups / 
input records > 0.8`, this grouping cardinality is likely not high enough. 
Maybe it's worth using `"WatchID", "ClientIP"` from q32 here, as it for sure 
benefited from skipping partial aggregation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to