Dandandan commented on PR #471:
URL: https://github.com/apache/arrow-ballista/pull/471#issuecomment-1295494810
No problems anymore.
We can see it now switches to the right semi join in q18 (20% improvement on
my machine):
```
HashJoinExec: mode=Partitioned, join_type=RightSemi, on=[(Column {
name: "l_orderkey", index: 0 }, Column { name: "o_orderkey", index: 2 })],
metrics=[output_rows=4368, input_rows=59985993, input_batches=3550,
output_batches=3550, join_time=1.121241193s]
```
Before:
```
HashJoinExec: mode=Partitioned, join_type=Semi, on=[(Column { name:
"o_orderkey", index: 2 }, Column { name: "l_orderkey", index: 0 })],
metrics=[output_rows=4368, input_batches=32, output_batches=32,
input_rows=4992, join_time=150.113398ms]
```
Also q20 has a ~10% improvement compared to master.
The HashJoinExec don't report the full metrics the `join_time` doesn't
include building the hashmap, which is the more expensive part of joins, Also
setting the bitmap isn't currently included in `join_time`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]