goldmedal commented on issue #15383:
URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2817168799

   I have another implementation for this issue 
https://github.com/goldmedal/datafusion/pull/4 
   The concept is that getting the row according to indices in the selection 
vector instead of going through all the rows in the batch.
   
   Because it may involve many changes, I want to check if the implementations 
make sense.
   Currently, I only implement `GroupValuesPrimitive::intern` for the group-by 
values. For the aggregation, I only implement `count` and some aggregations 
that use `GroupsAccumulator`.
   
   I also did some optimization for the sv-mode repartition 
https://github.com/apache/datafusion/pull/15423/files#r2051721176.
   
   However, I found the performance won't be better for Clickbench queries 4 
and 7. 
   ```
   Query 4: SELECT COUNT(DISTINCT "UserID") FROM hits;
   Query 7: SELECT "AdvEngineID", COUNT(*) FROM hits WHERE "AdvEngineID" <> 0 
GROUP BY "AdvEngineID" ORDER BY COUNT(*) DESC;
   
   --------------------
   Benchmark clickbench_1.json
   --------------------
   ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
   ┃ Query        ┃ feat_hash-agg-sv-disable ┃ feat_hash-agg-sv ┃    Change ┃
   ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
   │ QQuery 4     │                 311.74ms │         320.72ms │ no change │
   │ QQuery 7     │                  30.28ms │          29.01ms │ no change │
   └──────────────┴──────────────────────────┴──────────────────┴───────────┘
   ```
   I'm not sure if I'm on the right way 🤔 
   
   @Dandandan @Rachelint Do you have any suggestions for it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to