Tim Armstrong has posted comments on this change. Change subject: IMPALA-3077: Enable runtime filters when PHJ spills ......................................................................
Patch Set 6: Code-Review+1 (1 comment) Did you test this with the partitioned joins disabled? Since that code changed - can't wait to get rid of it. The performance results look ok to me. Hash table builds aren't typically the biggest bottleneck and we will also get a speedup of hash table builds with Skye's codegen patch so I'm not too concerned. Spilling should also be way faster also since we can reduce the probe side data to be spilled. http://gerrit.cloudera.org:8080/#/c/2783/6/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 489: // Use total_build_rows to estimate FP-rate of Bloom filter, and publish 'always-true' I wonder if we should adjust the heuristic if we're spilling, since the cost/benefit is quite different: even if the FP rate is high, it's still probably worth it if we write less data to disk. Don't need to tackle it in this patch, just a thought. -- To view, visit http://gerrit.cloudera.org:8080/2783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I59a2d9ee03ccea6b674392584e4c7f272233571e Gerrit-PatchSet: 6 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Henry Robinson <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
