Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3141: Send full filters when filter production is disabled ......................................................................
Patch Set 3: (13 comments) http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/exec/blocking-join-node.h File be/src/exec/blocking-join-node.h: Line 97: /// inside a subplan or the false-positive rate would be too high. we can have more than one filter per build side, each one with its own effective fp rate. leave todo to differentiate more carefully. right now it's based on the logic in phj that disables based on row count. but the variable it uses to indicate it's disabled is here, which breaks abstractions. the control flow is already hard to follow, so it's good to comment a bit more. http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 1723: if (filter->IsFull() || reject_ratio < FLAGS_parquet_min_filter_reject_ratio) { full/invalid filters shouldn't be part of filter_ctxs_ http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 402: // filter due to a spilled partition. is there a todo to separate spilling from filters? Line 521: BOOST_FOREACH(const FilterContext& ctx, filters_) { range for? http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 1992: if (state->completion_time != 0) return; it make more sense to adjust the pending_count in that case to avoid confusion (and comply with the comments in the .h) http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/runtime/runtime-filter.cc File be/src/runtime/runtime-filter.cc: Line 106: if (it == consumed_filters_.end()) return; how is this possible again? hasn't got registered yet? Line 113: // TODO: Avoid need for this copy. this is for local targets. why is there a need for this copy? Line 120: // Take lock only to ensure no race with PublishGlobalFilter() - there's no need for since this is a local target, is PublishGlobalFilter even called? (the coordinator should know about local targets) Line 148: if (it->second->filter_desc().is_broadcast_join && it->second->HasBloomFilter()) { would be nice to eliminate this code path in the coordinator as part of this patch, since you're already cleaning up http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/runtime/runtime-filter.h File be/src/runtime/runtime-filter.h: Line 201: bool IsFull() const { return HasBloomFilter() && bloom_filter_ == NULL; } i find the 'full' terminology a bit problematic, because we also have BloomFilter::Insert, which suggests a relationship that doesn't exit. how about invalid? that would also make clear why you bail as soon as it becomes invalid. http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: Line 31: // are 64 = 2^6 bytes in a cache line. could you add a static_assert for that? Line 64: void BloomFilter::ToThrift(TBloomFilter* thrift) const { while you're fixing things: you should point out in the .h file that this is destructive. (and if you want to, you can also add a bool BloomFilter::serialized_ and make sure that we don't call Insert or Or after ToThrift). http://gerrit.cloudera.org:8080/#/c/2475/3/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: Line 549: 4: required bool is_full instead of introducing a new concept (full), couldn't we phrase that as effectively by saying it's of size 0? -- To view, visit http://gerrit.cloudera.org:8080/2475 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I04b3e6542651c1e7b77a9bab01d0e3d9506af42f Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-HasComments: Yes
