goldmedal commented on issue #15383: URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2817168799
I have another implementation for this issue https://github.com/goldmedal/datafusion/pull/4 The concept is that getting the row according to indices in the selection vector instead of going through all the rows in the batch. Because it may involve many changes, I want to check if the implementations make sense. Currently, I only implement `GroupValuesPrimitive::intern` for the group-by values. For the aggregation, I only implement `count` and some aggregations that use `GroupsAccumulator`. I also did some optimization for the sv-mode repartition https://github.com/apache/datafusion/pull/15423/files#r2051721176. However, I found the performance won't be better for Clickbench queries 4 and 7. ``` Query 4: SELECT COUNT(DISTINCT "UserID") FROM hits; Query 7: SELECT "AdvEngineID", COUNT(*) FROM hits WHERE "AdvEngineID" <> 0 GROUP BY "AdvEngineID" ORDER BY COUNT(*) DESC; -------------------- Benchmark clickbench_1.json -------------------- ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ Query ┃ feat_hash-agg-sv-disable ┃ feat_hash-agg-sv ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩ │ QQuery 4 │ 311.74ms │ 320.72ms │ no change │ │ QQuery 7 │ 30.28ms │ 29.01ms │ no change │ └──────────────┴──────────────────────────┴──────────────────┴───────────┘ ``` I'm not sure if I'm on the right way 🤔 @Dandandan @Rachelint Do you have any suggestions for it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org