Dandandan edited a comment on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-894652913
My results on the latest version. ``` q1 took 36 ms q2 took 358 ms q3 took 998 ms q4 took 50 ms q5 took 983 ms q7 took 911 ms q10 took 4075 ms ``` q4 is improved in the latest version compared to earlier (it used a int32 column to group on). q2 still looks a bit (~10%) slower. The query is: `SELECT id1, id2, SUM(v1) AS v1 FROM tbl GROUP BY id1, id2` (id1, id2 are utf8, v1 is an int32) I wondering whether this comment https://github.com/apache/arrow-datafusion/pull/808/files#r683975473 might help a bit as it does some additional cloning of `ScalarValue`s. Another cause coulde be that hashing and comparing one `Vec<u8>` might be faster than hashing two single strings and combining them afterwards (however I would expect the extra copying / rehashing to be worse than the single cost of hashing itself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
