Henry Robinson has posted comments on this change. Change subject: PREVIEW: IMPALA-2550 Introduce query-wide execution context. ......................................................................
Patch Set 4: (30 comments) This is my first pass, and so doesn't capture all the style, comment etc issues. I'm first focusing on whether the ScopedQueryExecStateRef works as intended, and whether the lookup-by-fragment ID is required. I'm not yet convinced that this is coming out more cleanly than a shared_ptr with a custom deleter (which is effectively what we are doing here). http://gerrit.cloudera.org:8080/#/c/3817/4/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS4, Line 418: // TODO: Should we create a QueryExecState for the coordinator fragment? yes, otherwise you wind up with special-case logic to handle the paths where the QES is null. Why does this QES not go in the local QEM? http://gerrit.cloudera.org:8080/#/c/3817/4/be/src/runtime/query-exec-state.cc File be/src/runtime/query-exec-state.cc: PS4, Line 71: QueryExecMgr::ScopedQueryExecStateRef s(query_exec_mgr, fragment_instance_id); This is confusing: presumably this exists to ensure that *this has a lifetime at least as long as this method. But if there's a danger that *this will be destroyed during this method, that could happen before this line ever gets executed. So this won't have a reference to anything (and will actually crash). Why not have a fragment exec state add a reference to the QES that is decremented when the FES completes? Then the reference is guaranteed to be > 0 after line 49 above, and you don't need to take the reference here. http://gerrit.cloudera.org:8080/#/c/3817/4/be/src/runtime/query-exec-state.h File be/src/runtime/query-exec-state.h: Line 1: // Copyright 2016 Cloudera Inc. Fix license headers here and elsewhere. Line 36: /// The lifetime of this class is maintained by QueryExecMgr. No need to talk about what owners of this class might do with it. PS4, Line 41: NULL nit: use nullptr now. PS4, Line 84: std::shared_ptr Why shared? PS4, Line 101: boost std http://gerrit.cloudera.org:8080/#/c/3817/4/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: Line 274: QueryExecState* query_exec_state_; // not owned comment what this is. http://gerrit.cloudera.org:8080/#/c/3817/4/be/src/service/client-request-exec-state.cc File be/src/service/client-request-exec-state.cc: Line 14: #include "service/request-exec-state.h" add new line before this one Line 570: "query-exec-state", "wait-thread", &ImpalaServer::ClientRequestExecState::Wait, this)); long line Line 677: void ImpalaServer::ClientRequestExecState::UpdateNonErrorQueryState(QueryState::type query_state) { long line Line 829: Status ImpalaServer::ClientRequestExecState::GetRowValue(TupleRow* row, vector<void*>* result, long line http://gerrit.cloudera.org:8080/#/c/3817/4/be/src/service/query-exec-mgr.cc File be/src/service/query-exec-mgr.cc: PS4, Line 37: bind<void>(mem_fn(&QueryExecMgr::DestroyQueryExecState), this, _1, _2) avoid bind, prefer lambdas PS4, Line 48: const TUniqueId& query_id = exec_params.query_ctx.query_id; : // Preparing and opening the fragment creates a thread and consumes a non-trivial : // amount of memory. If we are already starved for memory, cancel the fragment as : // early as possible to avoid digging the hole deeper. : MemTracker* process_mem_tracker = ExecEnv::GetInstance()->process_mem_tracker(); : if (process_mem_tracker->LimitExceeded()) { : string msg = Substitute("Instance $0 of plan fragment $1 of query $2 could not " : "start because the backend Impala daemon is over its memory limit", : PrintId(exec_params.fragment_instance_ctx.fragment_instance_id), : exec_params.fragment_ctx.fragment.display_name, : PrintId(query_id)); : return process_mem_tracker->MemLimitExceeded(NULL, msg, 0); : } this logic looks like it should live inside RegisterFragmentInstance PS4, Line 65: // Create or retrieve the QueryExecState for this query id. : QueryExecState* query_exec_state = FindOrInsertQueryExecState(exec_params.query_ctx); : DCHECK(query_exec_state != NULL); : : // This is a no-op if another fragment of the same query already called Init(). : query_exec_state->Init(); Why not change the interface to the following: // Creates a QES if one doesn't exist. qes = QEM->GetQueryExecState(query_id); qes->RegisterFragment(fragment_id); Then the Init() is only ever done in one place (GetQES()), and you directly interact with the QES itself. PS4, Line 103: i->second->num_current_references_++; ++ PS4, Line 107: void QueryExecMgr::ReleaseQueryExecState(QueryExecState* query_exec_state) { : lock_guard<SpinLock> l(query_exec_state_map_lock_); : query_exec_state->num_current_references_--; : // If this thread was the last to use it, schedule it for destruction. : if (query_exec_state->num_current_references_ == 0) { : destroy_thread_.Offer(query_exec_state); : } : } A QES will never be destroyed unless at least one reference is ever taken to it. That works in this patch, because a fragment will always take at least one reference, but it's a bit subtle. Can you think of a way to make the guarantee of at-least-one-ref more explicit? PS4, Line 133: std remove std:: PS4, Line 137: query_exec_state_map_[query_id].get() nit: you're doing the lookup twice. insert returns a pair <iterator, bool> - just do: i = qesm_.insert(...).first; on line 134 and on this line: return i->second.get(); PS4, Line 142: params.fragment_instance_id The right thing to do is to augment the rpc params with the query ID, so you don't have to maintain the fragment->query map. http://gerrit.cloudera.org:8080/#/c/3817/4/be/src/service/query-exec-mgr.h File be/src/service/query-exec-mgr.h: Line 1: // Copyright 2014 Cloudera Inc. Replace header with ASF one. PS4, Line 29: fragments fragment PS4, Line 40: /// TODO: Remove Thrift args from methods where it would improve readability; : /// ImpalaInternalService can take care of translation to / from Thrift, as it already : /// does for RegisterFragmentInstanceWithQuery()'s return value. Can you do that in this patch? (Already noticed one instance below). Getting the interface right is critical for this patch. PS4, Line 61: void CancelPlanFragment(TCancelPlanFragmentResult& return_val, : const TCancelPlanFragmentParams& params); This shouldn't have RPC-style signature (since this isn't a proxy class): return a Status, have explicit parameters. PS4, Line 80: GetQueryExecStateFromFragmentId When do we know the fragment instance id, but not the query id? Line 91: class ScopedQueryExecStateRef { Comment? PS4, Line 94: const TUniqueId& fragment_id This should take a query ID. You should try to get rid of the fragment->query ID map and only access the QEM->QES->FES hierarchy from the top down (if I have a process I can get a QEM, if I have a query ID and a QEM can get a QES, if I have a QES and a fragment ID I can get an FES). PS4, Line 98: query_exec_state_ = query_exec_mgr_->GetQueryExecStateFromFragmentId(fragment_id); What prevents this from returning NULL? You need some mechanism to ensure that the QES actually exists and has not been torn down. PS4, Line 117: boost std PS4, Line 130: destroy_thread_ not just a single thread then. -- To view, visit http://gerrit.cloudera.org:8080/3817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
