ICDE)

via GitHub Tue, 12 Sep 2023 11:43:35 -0700


Dandandan commented on issue #6782:
URL: 
https://github.com/apache/arrow-datafusion/issues/6782#issuecomment-1716241284


   For the h2o.ai grouping benchmark these are my current educated guesses:
   
   * q6 seems slow because of `median`, which I expect to be improved in 
DataFusion 30. Also `median`/ `stddev` don't support `GroupsAccumulator` which 
could be implemented.
   * q9 is `select id2, id4, pow(corr(v1, v2), 2) as r2 from h2o group by id2, 
id4;` it seems likely to me the `covariance` aggregation is relatively slow. 
also, it doesn't support `GroupsAccumulator`.
   * q10: `select id1, id2, id3, id4, id5, id6, sum(v3) as v3, count(*) as 
count from h2o group by id1, id2, id3, id4, id5, id6;`. Might be grouping by 6 
columns is relatively slow?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan commented on issue #6782: Write DataFusion paper for (SIGMOD / VLDB / ICDE)

Reply via email to