Henry Robinson has posted comments on this change. Change subject: IMPALA-3141: Send full filters when filter production is disabled ......................................................................
Patch Set 3: (13 comments) http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/exec/blocking-join-node.h File be/src/exec/blocking-join-node.h: Line 97: /// inside a subplan or the false-positive rate would be too high. > we can have more than one filter per build side, each one with its own effe Done http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 1723: if (filter->IsFull() || reject_ratio < FLAGS_parquet_min_filter_reject_ratio) { > full/invalid filters shouldn't be part of filter_ctxs_ Why not? It seems reasonable to me that a context should be able to answer questions about the filter it contains. The alternative is to allow something like: filter_ctx->filter()->IsFull() but I think that breaks abstraction further by exposing the inner implementation. The idea of letting all values pass the filter is general over all implementations (e.g. a filter on a < predicate where the minimum is -MAX_INT). http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 402: // filter due to a spilled partition. > is there a todo to separate spilling from filters? There's a JIRA: IMPALA-3077 Line 521: BOOST_FOREACH(const FilterContext& ctx, filters_) { > range for? Avoiding C++11 in this file which is, in part, cross-compiled. http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 1992: if (state->completion_time != 0) return; > it make more sense to adjust the pending_count in that case to avoid confus Done http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/runtime/runtime-filter.cc File be/src/runtime/runtime-filter.cc: Line 106: if (it == consumed_filters_.end()) return; > how is this possible again? hasn't got registered yet? This is a hangover from when filters could be both local and have the coordinator deliver them (before the code explicitly distinguished local and global targets). Line 113: // TODO: Avoid need for this copy. > this is for local targets. why is there a need for this copy? ah, it's not needed after Michael's changes to the way filters are accounted for. Line 120: // Take lock only to ensure no race with PublishGlobalFilter() - there's no need for > since this is a local target, is PublishGlobalFilter even called? (the coor No, it's not any more (the coordinator's not even involved - local targets aren't ever sent to the coordinator). Line 148: if (it->second->filter_desc().is_broadcast_join && it->second->HasBloomFilter()) { > would be nice to eliminate this code path in the coordinator as part of thi Can be removed. http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/runtime/runtime-filter.h File be/src/runtime/runtime-filter.h: Line 201: bool IsFull() const { return HasBloomFilter() && bloom_filter_ == NULL; } > i find the 'full' terminology a bit problematic, because we also have Bloom Sorry, I'm having some trouble parsing that first sentence - what's the relationship between Insert() and IsFull() ? In general I don't like 'invalid' so much because it suggests there's an unexpected error (should the query stop when it gets an invalid filter?) where full filters are just a particular instance of a filter and so logically shouldn't be treated any differently. http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: Line 31: // are 64 = 2^6 bytes in a cache line. > could you add a static_assert for that? Not sure exactly what you want asserted - that 1 << LOG_BUCKET_WORD_BITS == 64? Line 64: void BloomFilter::ToThrift(TBloomFilter* thrift) const { > while you're fixing things: you should point out in the .h file that this i I don't think it is destructive - the string constructor copies directory_, not the pointer. http://gerrit.cloudera.org:8080/#/c/2475/3/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: Line 549: 4: required bool is_full > instead of introducing a new concept (full), couldn't we phrase that as eff I thought about that. My feeling was that encoding a full filter as one with size 0 is not representationally consistent (does a size 0 vector have all its bits set, or none of them?), and therefore requires just as much logic to deal with. as you need to special case directory.size() == 0. In that case it's better, in my opinion, to explicitly record the fact that the filter is full. -- To view, visit http://gerrit.cloudera.org:8080/2475 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I04b3e6542651c1e7b77a9bab01d0e3d9506af42f Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Henry Robinson <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-HasComments: Yes
