Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3007: Adjust Bloom Filter size according to NDV estimate ......................................................................
Patch Set 1: (13 comments) http://gerrit.cloudera.org:8080/#/c/2812/1/be/src/exec/hash-join-node.cc File be/src/exec/hash-join-node.cc: Line 230: hash_tbl_->AddBloomFilters(); huh? http://gerrit.cloudera.org:8080/#/c/2812/1/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: Line 156: uint32_t log_space = state->filter_bank()->GetLogSpaceForNdv(filter.ndv_estimate); why not have a GetFilterByteSize() or something like that. the scan node shouldn't have to think about the details of the filter implementation http://gerrit.cloudera.org:8080/#/c/2812/1/be/src/exec/old-hash-table.cc File be/src/exec/old-hash-table.cc: Line 148: filters_[i]->filter_desc().filter_id); this takes the capacity, not an id. http://gerrit.cloudera.org:8080/#/c/2812/1/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 500: state->filter_bank()->FpRateTooHigh(ndv_estimate, total_build_rows); just as an aside: instead of looking at build rows, which is indirect, why not look at the total number of bits set in the bloom filter instead? that should give you a clear indication of how much more it can absorb. leave todo. http://gerrit.cloudera.org:8080/#/c/2812/1/be/src/runtime/runtime-filter.cc File be/src/runtime/runtime-filter.cc: Line 171: uint64_t required_space = let's not use unsigned ints http://gerrit.cloudera.org:8080/#/c/2812/1/be/src/runtime/runtime-filter.h File be/src/runtime/runtime-filter.h: Line 77: /// expected false-positive rate would be larger than allowed by "a filter's expected false-positive rate would exceed flags_max_filter_error_rate"? Line 79: bool FpRateTooHigh(uint64_t expected_ndv, uint64_t observed_ndv); role of expected_ndv unclear Line 94: BloomFilter* AllocateScratchBloomFilter(int64_t ndv_estimate); > Will update comment. instead of continuing to talk about ndv and estimates, which are fe concepts, please switch to different terminology throughout (num_entries? capacity?). http://gerrit.cloudera.org:8080/#/c/2812/1/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java: Line 415: filter.computeNdvEstimate(); this also needs to happen for repartitioning joins http://gerrit.cloudera.org:8080/#/c/2812/1/fe/src/main/java/com/cloudera/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/com/cloudera/impala/planner/RuntimeFilterGenerator.java: Line 113: // Estimate of the number of distinct values that will be inserted into this filter. explain meaning of -1 also, unclear whether this is globally or locally (there's a difference for repartitioning joins) http://gerrit.cloudera.org:8080/#/c/2812/1/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test File testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test: Line 253: # Test case 11: filters with high expected FP rate get disabled. what does "expected" mean here? Line 362: join (select * from l LIMIT 2000000) b on a.l_orderkey = -b.l_orderkey; pick a number that'll exceed the size limit http://gerrit.cloudera.org:8080/#/c/2812/1/testdata/workloads/functional-query/queries/QueryTest/runtime_filters_wait.test File testdata/workloads/functional-query/queries/QueryTest/runtime_filters_wait.test: Line 37: row_regex: .*0 of 1 Runtime Filters Produced.* i'm not sure about this error message, it makes it sound like something went wrong -- To view, visit http://gerrit.cloudera.org:8080/2812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1fe37b8d4cfb3c52bb8e8cf0ca55e92665b87803 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Henry Robinson <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-HasComments: Yes
