Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3077: Enable runtime filters when PHJ spills
......................................................................


Patch Set 6: Code-Review+1

(1 comment)

Did you test this with the partitioned joins disabled? Since that code changed 
- can't wait to get rid of it.

The performance results look ok to me. Hash table builds aren't typically the 
biggest bottleneck and we will also get a speedup of hash table builds with 
Skye's codegen patch so I'm not too concerned.

Spilling should also be way faster also since we can reduce the probe side data 
to be spilled.

http://gerrit.cloudera.org:8080/#/c/2783/6/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 489:   // Use total_build_rows to estimate FP-rate of Bloom filter, and 
publish 'always-true'
I wonder if we should adjust the heuristic if we're spilling, since the 
cost/benefit is quite different: even if the FP rate is high, it's still 
probably worth it if we write less data to disk.

Don't need to tackle it in this patch, just a thought.


-- 
To view, visit http://gerrit.cloudera.org:8080/2783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I59a2d9ee03ccea6b674392584e4c7f272233571e
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to