Dandandan edited a comment on pull request #8961: URL: https://github.com/apache/arrow/pull/8961#issuecomment-748437288
I checked merging the other PR https://github.com/apache/arrow/pull/8965 which improves the join implementation. Besides being much fastest regardless of this PR, reordering gives a further ~15% reduction in time when reordering the following query (6001214 vs 1499999 rows) ``` select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count from lineitem join orders on l_orderkey = o_orderkey group by l_shipmode order by l_shipmode;" ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
