Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3610: Account for memory used by filters in the 
coordinator
......................................................................


Patch Set 9:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/4066/9/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 262: /// A filter may be discarded if an always_true filter update is 
received, an OOM is hit,
> you use both disabled and discarded to mean the same thing (i think). let's
Done


Line 303:   /// Resets bloom_filter_ to NULL, sets pending_count_ to 0, 
releases any memory
> don't reference internal state in the comment of a public function (the rea
Done


Line 390:     // comprehensive logging of all mem-trackers.
> add TODO-MT: remove
Done


Line 718:     // Cast the const away to read non-const member functions. No 
modification is done.
> why do you need non-const access?
I can't call non-const functions otherwise, specifically state.targets() and 
state.src_fragment_instance_idxs(). They need to be non-const because they are 
written to elsewhere.

Is there a better way to do this?


Line 726:     for (const auto& target: *state.targets()) {
> no auto
Done


Line 2048:   for (auto& filter: filter_routing_table_) {
> why make this a ref?
I don't understad, versus?
Not having a ref would implicitly call a copy-constructor.


Line 2086:     DCHECK_GT(state->pending_count(), 0);
> from l2086 to l2090 can also go into ApplyUpdate()
Done


Line 2099:     // If the filter was discarded, we should send out an 
always_true filter immediately.
> this comment doesn't describe the following statement
Done. Removed the comment. The next line describes this anyway.


Line 2105:     // No more filters are pending on this filter ID. Create a 
distribution payload and
> "no more updates are..."
Done


Line 2107:     state->set_completion_time(query_events_->ElapsedTime());
> move into ApplyUpdate
This has to be set only once we are sure that we aren't going to wait for 
anymore updates. So this should be set only once after all updates are received 
or if a filter is disabled.


Line 2119:     if (state->desc().is_broadcast_join) {
> also swap in the filter in ApplyUpdate for broadcast joins, then you don't 
That would cause broadcast filters having to swap() twice, which is two times 
the memory copy which could be noticeable for large filters. Also, we would 
TryConsume() memory for broadcast filters too then.

It's probably okay since we process only one update for broadcast filters, but 
if we're okay with that I'll make the change quickly.


Line 2183:   pending_count_ = 0L;
> does the pending count still matter at this point (and if so, why?)
Not really. Setting it to 0 here has no more meaning. I removed this assignment.


http://gerrit.cloudera.org:8080/#/c/4066/9/be/src/util/bloom-filter-test.cc
File be/src/util/bloom-filter-test.cc:

Line 277:   BloomFilter::ToThrift(&bf3, &bf3_thrift);  
> trailing spaces
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: Yes

Reply via email to