[Impala-CR](cdh5-trunk) IMPALA-3007: Adjust Bloom Filter size according to NDV estimate

Henry Robinson (Code Review) Thu, 28 Apr 2016 17:00:43 -0700

Henry Robinson has posted comments on this change.

Change subject: IMPALA-3007: Adjust Bloom Filter size according to NDV estimate
......................................................................



Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/2812/4/be/src/exec/old-hash-table.cc
File be/src/exec/old-hash-table.cc:

Line 143:     int64_t filter_size = state_->filter_bank()->GetFilterSizeForNdv(
> i think it's best to compute the size during initialization of the filter c
Done


http://gerrit.cloudera.org:8080/#/c/2812/4/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 522:         ctx.filter->filter_desc().ndv_estimate);
> why not record the actual size in the filter context. that way, you can't h
It's a bit more convenient to put it in the RuntimeFilter itself (that way the 
RuntimeFilterBank can set it without the caller having to do so).


Line 524:     bool fp_rate_too_high =
> don't we check this periodically, or is that only done in the parquet scann
Only in the parquet scanner. Not clear if there'd be an advantage to doing it 
periodically here as doing so would add a branch back to the filter building 
path.


http://gerrit.cloudera.org:8080/#/c/2812/4/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java
File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java:

Line 518:       // estimates may have changed as well.
> looking at this again, i'm not sure i follow that argument. in what way hav
After further digging in, I think this is a bug. With a join-node that has a 
union as build input and a limit on the union, the cardinality estimate after 
single-node planning (when the filters are originally generated) is incorrect. 
It does not apply the limit.

After fragmentation (and the insertion of an xchg into the build input) the 
cardinality corrects itself. 

See IMPALA-3450 for more details.


http://gerrit.cloudera.org:8080/#/c/2812/4/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test
File 
testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test:

Line 6: # consumption / spilling behaviour.
> move that last sentence to the very top, as a separate comment, because it 
Not sure exactly where you mean: this is the only query this applies to. (The 
nested-types query below is here because the old HJ implementation doesn't work 
with nested-types).


Line 26: 
> reign in the blank lines
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/2812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1fe37b8d4cfb3c52bb8e8cf0ca55e92665b87803
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3007: Adjust Bloom Filter size according to NDV estimate

Reply via email to