Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3007: Adjust Bloom Filter size according to NDV estimate
......................................................................


Patch Set 1:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/2812/1/be/src/exec/hash-join-node.cc
File be/src/exec/hash-join-node.cc:

Line 230:   hash_tbl_->AddBloomFilters();
huh?


http://gerrit.cloudera.org:8080/#/c/2812/1/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

Line 156:     uint32_t log_space = 
state->filter_bank()->GetLogSpaceForNdv(filter.ndv_estimate);
why not have a GetFilterByteSize() or something like that. the scan node 
shouldn't have to think about the details of the filter implementation


http://gerrit.cloudera.org:8080/#/c/2812/1/be/src/exec/old-hash-table.cc
File be/src/exec/old-hash-table.cc:

Line 148:           filters_[i]->filter_desc().filter_id);
this takes the capacity, not an id.


http://gerrit.cloudera.org:8080/#/c/2812/1/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 500:         state->filter_bank()->FpRateTooHigh(ndv_estimate, 
total_build_rows);
just as an aside: instead of looking at build rows, which is indirect, why not 
look at the total number of bits set in the bloom filter instead? that should 
give you a clear indication of how much more it can absorb.

leave todo.


http://gerrit.cloudera.org:8080/#/c/2812/1/be/src/runtime/runtime-filter.cc
File be/src/runtime/runtime-filter.cc:

Line 171:   uint64_t required_space =
let's not use unsigned ints


http://gerrit.cloudera.org:8080/#/c/2812/1/be/src/runtime/runtime-filter.h
File be/src/runtime/runtime-filter.h:

Line 77:   /// expected false-positive rate would be larger than allowed by
"a filter's expected false-positive rate would exceed 
flags_max_filter_error_rate"?


Line 79:   bool FpRateTooHigh(uint64_t expected_ndv, uint64_t observed_ndv);
role of expected_ndv unclear


Line 94:   BloomFilter* AllocateScratchBloomFilter(int64_t ndv_estimate);
> Will update comment.
instead of continuing to talk about ndv and estimates, which are fe concepts, 
please switch to different terminology throughout (num_entries? capacity?).


http://gerrit.cloudera.org:8080/#/c/2812/1/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java
File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java:

Line 415:         filter.computeNdvEstimate();
this also needs to happen for repartitioning joins


http://gerrit.cloudera.org:8080/#/c/2812/1/fe/src/main/java/com/cloudera/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/com/cloudera/impala/planner/RuntimeFilterGenerator.java:

Line 113:     // Estimate of the number of distinct values that will be 
inserted into this filter.
explain meaning of -1

also, unclear whether this is globally or locally (there's a difference for 
repartitioning joins)


http://gerrit.cloudera.org:8080/#/c/2812/1/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
File testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test:

Line 253: # Test case 11: filters with high expected FP rate get disabled.
what does "expected" mean here?


Line 362:     join (select * from l LIMIT 2000000) b on a.l_orderkey = 
-b.l_orderkey;
pick a number that'll exceed the size limit


http://gerrit.cloudera.org:8080/#/c/2812/1/testdata/workloads/functional-query/queries/QueryTest/runtime_filters_wait.test
File 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_wait.test:

Line 37: row_regex: .*0 of 1 Runtime Filters Produced.*
i'm not sure about this error message, it makes it sound like something went 
wrong


-- 
To view, visit http://gerrit.cloudera.org:8080/2812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1fe37b8d4cfb3c52bb8e8cf0ca55e92665b87803
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-HasComments: Yes

Reply via email to