Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3610: Account for memory used by filters in the coordinator ......................................................................
Patch Set 4: (10 comments) I think I finally fixed the final crash. The lifetimes of every member introduced with this patch seemed to be perfect but ASAN still complained of a use-after-free issue. I spoke to Alex, and he pointed out that the scanner threads that are spawned by the "main" thread may still be running after the Coordinator is destroyed. And that exactly was the issue. Thanks Alex. The filter_mem_tracker_ introduced with this patch is the child tracker of the RuntimeState::query_mem_tracker_ in the case where there is a coordinator fragment. At times, the main thread exits cleanly, destroying the coordinator on an error (like OOM), but the scanner threads are still running and access MemTracker::LogUsage() which touch the Coordinator::filter_mem_tracker_ causing a use-after-free. Looks like the lifetimes of many objects were not set concretely and we've been getting by without crashes due to shared pointers. Hopefully these lifetime issues will be fixed with the different patches that go in for the MT effort. http://gerrit.cloudera.org:8080/#/c/4066/2/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 2040: state->DiscardFilter(filter_mem_tracker_.get()); > initialize state->bloom_filter to an empty struct, then do the swap. Done http://gerrit.cloudera.org:8080/#/c/4066/4/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 2010: lock_guard<SpinLock> l(filter_lock_); > this is a very coarse lock. most of the code after l2037 is specific to sta I've left a TODO to add a per-object lock in FilterState. I'm not sure how I can reduce the holding time of filter_lock_ without the per state lock itself. Line 2037: state->ApplyUpdate(filter_updates_received_); > almost everything from here to l2056 updates 'state'. move into ApplyUpdate Done Line 2051: state->bloom_filter.reset(new TBloomFilter(params.bloom_filter)); > swap instead Done Line 2057: if (state->pending_count > 0) return; > follow with blank line, there is a logical break here (first you updated st Done Line 2108: pending_count = 0L; > why did you move this assignment? Yes it is necessary for correctness. I've added a comment why http://gerrit.cloudera.org:8080/#/c/4066/4/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: Line 104: for (int i = 0; i < in.directory.size(); ++i) { > single line? Done http://gerrit.cloudera.org:8080/#/c/4066/4/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: Line 82: void Or(const BloomFilter& other); > is this still getting called? No it isn't. Only tests use it now. Line 106: static void OrTBloomFilter(const TBloomFilter& in, TBloomFilter* out); > no need to repeat param types in function name. Done http://gerrit.cloudera.org:8080/#/c/4066/4/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test File testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test: Line 85: SET MEM_LIMIT=200MB; > can't this run with a smaller limit? Did some empirical runs and found that I could run it with as low as 140MB. -- To view, visit http://gerrit.cloudera.org:8080/4066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
