Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3141: Send full filters when filter production is 
disabled
......................................................................


Patch Set 3:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/exec/blocking-join-node.h
File be/src/exec/blocking-join-node.h:

Line 97:   /// inside a subplan or the false-positive rate would be too high.
we can have more than one filter per build side, each one with its own 
effective fp rate.

leave todo to differentiate more carefully. right now it's based on the logic 
in phj that disables based on row count. but the variable it uses to indicate 
it's disabled is here, which breaks abstractions. the control flow is already 
hard to follow, so it's good to comment a bit more.


http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 1723:         if (filter->IsFull() || reject_ratio < 
FLAGS_parquet_min_filter_reject_ratio) {
full/invalid filters shouldn't be part of filter_ctxs_


http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 402:     // filter due to a spilled partition.
is there a todo to separate spilling from filters?


Line 521:   BOOST_FOREACH(const FilterContext& ctx, filters_) {
range for?


http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 1992:     if (state->completion_time != 0) return;
it make more sense to adjust the pending_count in that case to avoid confusion 
(and comply with the comments in the .h)


http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/runtime/runtime-filter.cc
File be/src/runtime/runtime-filter.cc:

Line 106:       if (it == consumed_filters_.end()) return;
how is this possible again? hasn't got registered yet?


Line 113:       // TODO: Avoid need for this copy.
this is for local targets. why is there a need for this copy?


Line 120:       // Take lock only to ensure no race with PublishGlobalFilter() 
- there's no need for
since this is a local target, is PublishGlobalFilter even called? (the 
coordinator should know about local targets)


Line 148:   if (it->second->filter_desc().is_broadcast_join && 
it->second->HasBloomFilter()) {
would be nice to eliminate this code path in the coordinator as part of this 
patch, since you're already cleaning up


http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/runtime/runtime-filter.h
File be/src/runtime/runtime-filter.h:

Line 201:   bool IsFull() const { return HasBloomFilter() && bloom_filter_ == 
NULL; }
i find the 'full' terminology a bit problematic, because we also have 
BloomFilter::Insert, which suggests a relationship that doesn't exit.

how about invalid? that would also make clear why you bail as soon as it 
becomes invalid.


http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

Line 31:       // are 64 = 2^6 bytes in a cache line.
could you add a static_assert for that?


Line 64: void BloomFilter::ToThrift(TBloomFilter* thrift) const {
while you're fixing things: you should point out in the .h file that this is 
destructive. (and if you want to, you can also add a bool 
BloomFilter::serialized_ and make sure that we don't call Insert or Or after 
ToThrift).


http://gerrit.cloudera.org:8080/#/c/2475/3/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 549:   4: required bool is_full
instead of introducing a new concept (full), couldn't we phrase that as 
effectively by saying it's of size 0?


-- 
To view, visit http://gerrit.cloudera.org:8080/2475
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I04b3e6542651c1e7b77a9bab01d0e3d9506af42f
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-HasComments: Yes

Reply via email to