Dandandan commented on PR #471:
URL: https://github.com/apache/arrow-ballista/pull/471#issuecomment-1295494810

   No problems anymore.
   
   We can see it now switches to the right semi join in q18 (20% improvement on 
my machine):
   
   ```
         HashJoinExec: mode=Partitioned, join_type=RightSemi, on=[(Column { 
name: "l_orderkey", index: 0 }, Column { name: "o_orderkey", index: 2 })], 
metrics=[output_rows=4368, input_rows=59985993, input_batches=3550, 
output_batches=3550, join_time=1.121241193s]
   ```
   
   Before:
   
   ```
         HashJoinExec: mode=Partitioned, join_type=Semi, on=[(Column { name: 
"o_orderkey", index: 2 }, Column { name: "l_orderkey", index: 0 })], 
metrics=[output_rows=4368, input_batches=32, output_batches=32, 
input_rows=4992, join_time=150.113398ms]
   ```
   
   
   Also q20 has a ~10% improvement compared to master.
   
   The HashJoinExec don't report the full metrics the `join_time` doesn't 
include building the hashmap, which is the more expensive part of joins, Also 
setting the bitmap isn't currently included in `join_time`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to