Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3610: Account for memory used by filters in the 
coordinator
......................................................................


Patch Set 4:

(10 comments)

I think I finally fixed the final crash. The lifetimes of every member 
introduced with this patch seemed to be perfect but ASAN still complained of a 
use-after-free issue.

I spoke to Alex, and he pointed out that the scanner threads that are spawned 
by the "main" thread may still be running after the Coordinator is destroyed. 
And that exactly was the issue. Thanks Alex.

The filter_mem_tracker_ introduced with this patch is the child tracker of the 
RuntimeState::query_mem_tracker_ in the case where there is a coordinator 
fragment. At times, the main thread exits cleanly, destroying the coordinator 
on an error (like OOM), but the scanner threads are still running and access 
MemTracker::LogUsage() which touch the Coordinator::filter_mem_tracker_ causing 
a use-after-free.

Looks like the lifetimes of many objects were not set concretely and we've been 
getting by without crashes due to shared pointers. Hopefully these lifetime 
issues will be fixed with the different patches that go in for the MT effort.

http://gerrit.cloudera.org:8080/#/c/4066/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 2040:         state->DiscardFilter(filter_mem_tracker_.get());
> initialize state->bloom_filter to an empty struct, then do the swap.
Done


http://gerrit.cloudera.org:8080/#/c/4066/4/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 2010:     lock_guard<SpinLock> l(filter_lock_);
> this is a very coarse lock. most of the code after l2037 is specific to sta
I've left a TODO to add a per-object lock in FilterState.

I'm not sure how I can reduce the holding time of filter_lock_ without the per 
state lock itself.


Line 2037:     state->ApplyUpdate(filter_updates_received_);
> almost everything from here to l2056 updates 'state'. move into ApplyUpdate
Done


Line 2051:           state->bloom_filter.reset(new 
TBloomFilter(params.bloom_filter));
> swap instead
Done


Line 2057:     if (state->pending_count > 0) return;
> follow with blank line, there is a logical break here (first you updated st
Done


Line 2108:   pending_count = 0L;
> why did you move this assignment?
Yes it is necessary for correctness. I've added a comment why


http://gerrit.cloudera.org:8080/#/c/4066/4/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

Line 104:   for (int i = 0; i < in.directory.size(); ++i) {
> single line?
Done


http://gerrit.cloudera.org:8080/#/c/4066/4/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

Line 82:   void Or(const BloomFilter& other);
> is this still getting called?
No it isn't. Only tests use it now.


Line 106:   static void OrTBloomFilter(const TBloomFilter& in, TBloomFilter* 
out);
> no need to repeat param types in function name.
Done


http://gerrit.cloudera.org:8080/#/c/4066/4/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test
File 
testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test:

Line 85: SET MEM_LIMIT=200MB;
> can't this run with a smaller limit?
Did some empirical runs and found that I could run it with as low as 140MB.


-- 
To view, visit http://gerrit.cloudera.org:8080/4066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: Yes

Reply via email to