Sailesh Mukil has posted comments on this change. Change subject: IMPALA-4014: Introduce query-wide execution state. ......................................................................
Patch Set 10: (52 comments) http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/runtime/fragment-instance-exec-state.h File be/src/runtime/fragment-instance-exec-state.h: Line 32: /// Execution state of a single plan fragment. > instance Done PS10, Line 33: FragmentInstanceExecState > Let's consider FragmentInstanceState and QueryState (if they're not executi Done PS10, Line 33: FragmentInstanceExecState > i think that's a good idea, long names aren't necessarily more readable or Done PS10, Line 39: lambda( : boost::mem_fn(&FragmentInstanceExecState::ReportStatusCb), : this, _1, _2, _3) > Try and find a clearer way to write this. If you don't want to write the la I'm still not very well versed on the C++ lambda syntax. My next step is to create a compilable version of this patch. I'll make the changes to this in the next patch set so that I get the syntax right and make sure that it compiles. Also, I don't want to hold up reviewers while I figure this out. PS10, Line 80: ImpalaBackendClientCache* client_cache_; > Is this necessary given that ExecEnv can give us the same pointer? No. I've changed it. PS10, Line 77: QueryExecState* query_exec_state_; // not owned : TPlanFragmentInstanceCtx fragment_instance_ctx_; : FragmentInstanceExecutor executor_; : ImpalaBackendClientCache* client_cache_; : TExecPlanFragmentParams exec_params_; > Good opportunity to add some comments for these. Done PS10, Line 83: plan fragment > fragment instance. Done http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/runtime/fragment-instance-executor.cc File be/src/runtime/fragment-instance-executor.cc: PS10, Line 80: Close > Since we are aiming for deterministic, manual management of finst lifetimes It already does get called explicitly and this call shouldn't be necessary. I changed it to a DCHECK. Line 349: // We also want to call sink_->Close() here rather than in FragmentInstanceExecutor::Close > long line Done http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/runtime/fragment-instance-executor.h File be/src/runtime/fragment-instance-executor.h: Line 48: /// FragmentInstanceExecutor handles all aspects of the execution of a single plan fragment, > long line Done PS10, Line 155: ExecEnv* exec_env_; // not owned > Can you remove this? ExecEnv::GetInstance() works fine. Done http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/runtime/query-exec-state.cc File be/src/runtime/query-exec-state.cc: PS10, Line 60: > remove space Done PS10, Line 73: DCHECK_EQ(fragment_inst_exec_state_map_.find(GetInstanceIdx(fragment_instance_id)), : fragment_inst_exec_state_map_) > these aren't the same type (does this compile?). Oops, sorry. Meant to compare it with map_.end(). Changed it. I haven't gotten this to compile yet, so there may be a few syntax errors that I've missed. PS10, Line 76: FragmentInstanceExecState* fragment_inst_exec_state = : new FragmentInstanceExecState(exec_params, this, ExecEnv::GetInstance()); > Presumably this should be managed by obj_pool. If not, the value type of th Are there any arguments for or against a unique_ptr? It seems to fit the use here perfectly: - We would never want to transfer ownership of the FragmentInstanceState objects. - We want its lifetime to be as long as the QueryState. If there are no objections, I can change it to use a unique_ptr. PS10, Line 80: his RPC returns > This isn't an RPC any more. Update comment. Done PS10, Line 113: lexical_cast > PrintId() Done PS10, Line 128: lexical_cast > PrintId() Done http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/runtime/query-exec-state.h File be/src/runtime/query-exec-state.h: Line 38: /// This class is responsible for creating, running and destroying FragmentInstanceExecStates. > Talk about lifetime here: when is this created, when is it destroyed, what Done PS10, Line 41: // Special constructor for the coordinator. > TODO: remove. Done PS10, Line 59: CleanFragmentInstanceStates > Don't talk about implementation details (cleaning the map). When would this Done Line 61: /// Registers a new FragmentInstanceExecState and launches the thread that calls Prepare() and > long line. Done PS10, Line 90: std::unique_ptr<ObjectPool*> obj_pool_ > Comment broadly what this is used to manage, since that affects lifecycles. It doesn't manage anything yet in this patch because we decided to bring in the shared state across fragments (DescriptorTbl) in a future patch. The reason is that the serialization/deserialization of the DescriptorTbl needs to be modified quite a bit to become a general DescTbl across all fragments. So should I remove this since it won't be used for now? Or should I have the FragmentExecStates be managed by the ObjectPool? If so, what's the advantage of that? PS10, Line 90: * > do you mean unique_ptr<ObjectPool> ? Oops, yes. Changed it. PS10, Line 90: Owned > redundant wrt unique_ptr Done PS10, Line 94: active > expand. What does it mean for a fragment to be active? Done PS10, Line 105: shared_ptr > Update comment. Done PS10, Line 109: FragmentInstanceExecState > Are the exec states managed by the obj pool? If so, say so otherwise the li No they currently aren't because the lifetime of the FragmentInstanceState seems very clear, i.e. it's tied to the lifetime of the QueryState. Is there any advantage of adding it to the object pool? PS10, Line 113: shared pointer > update comment Done http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/service/client-request-exec-state.cc File be/src/service/client-request-exec-state.cc: Line 17: #include "service/client-request-exec-state.h" > add a blank line before this one Done http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/service/client-request-exec-state.h File be/src/service/client-request-exec-state.h: PS10, Line 48: This is responsible for creating the : /// coordinator object > Seems unnecessary. Done http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/service/impala-internal-service.h File be/src/service/impala-internal-service.h: PS10, Line 48: fragment_mgr_->CancelPlanFragment(params.query_id, : params.fragment_instance_id); > Copy the one-line idiom from other RPCs, and set the TStatus directly witho Done PS10, Line 74: Ignoring returned status here as failing to publish a filter is not fatal. > It would be better to add a TStatus to the return struct, and have the call Done http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/service/query-exec-mgr.cc File be/src/service/query-exec-mgr.cc: PS10, Line 40: bind<void>(mem_fn( > please avoid in favour of lambda expressions, to reduce dependency on boost Likewise, I'll make this change in the next patchset which is compilable. PS10, Line 74: fragment_inst_exec_state->set_exec_thread(new Thread("query-exec-state", : Substitute("exec-plan-fragment-$0", PrintId(fragment_id)), : &QueryExecMgr::ExecInstance, this, fragment_inst_exec_state)); > Why is the QEM running the thread, and not the QES? If it's to ensure a ref We had a discussion about this yesterday. I've changed it like this for the following reason: Previously, the ExecInstance() thread was run by the QES. So, the QEM used to reserve refcounts (increase) and the QES used to decrement the refcounts in ExecInstance() by calling ReleaseQES(*this). This means that the refcount is increased in one level(QEM) and decreased in another level(QES). Keeping it this way means that the QEM does both the increase/decrease of the refcounts. This will be shortlived as there are plans to include a QueryExecutor class (similar to PlanFragmentExecutor, but for a query), that will take care of query execution and this logic will be moved there instead. Line 123: DCHECK(i->second->num_current_references_ != 0); > DCHECK_GT Done Line 142: query_exec_state_map_.find(query_id); > one line? Done PS10, Line 146: std > remove std:: This is not a unique_ptr anymore. Removed it. PS10, Line 148: ++(i->second->num_current_references_); > Who returns this reference? The caller of FindOrCreate...() can't tell if i The caller doesn't need to know that. Why would it? PS10, Line 159: lexical_cast > PrintId Done PS10, Line 167: this > QESGuard() takes a QEM*. Did this compile? No, this was a mistake. I've removed this argument anyway. http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/service/query-exec-mgr.h File be/src/service/query-exec-mgr.h: PS10, Line 31: shared > 'Shared' suggests ownership. Better is to describe that the QES manages, or Done Line 45: /// does for RegisterFragmentInstanceWithQuery()'s return value. > I think after the batching change the QEM shouldn't ever know about fragmen Done PS10, Line 46: QueryExecMgr > please instrument this class with more metrics as needed (e.g. number of ru Done PS10, Line 76: Status CancelPlanFragment(const TUniqueId& query_id, : const TUniqueId& fragment_instance_id); > Remind me why we have this shorthand for: I'm sorry I don't follow the question. Could you clarify? PS10, Line 84: c'tor > this comment is describing a class, not a c'tor. Not sure what you're tryin I meant class object. Changed it. PS10, Line 87: nothing > imprecise - talk in terms of invariants on methods, e.g. query_exec_state() Done PS10, Line 90: /// This class makes sure that the caller cannot directly access the QueryExecState : /// and allowing access to the QES only through the APIs provided from this class. : /// This is so that the caller doesn't accidentaly try and transfer ownership of the : /// QES. : /// TODO: Is this really necessary? The downside is that this class needs to duplicate : /// function signatures for every function the QES wants to expose. > Isn't this comment no longer true given QESGuard::query_exec_state() ? Yes, I forgot to update the comment. Removed it. PS10, Line 99: QueryExecMgr > why is this an argument? Surely QEM is a process-wide singleton, and we can I've made the ExecEnv hold a pointer to the QEM and removed this argument. Line 112: QueryExecState* query_exec_state_; > add DISALLOW_COPY_AND_ASSIGN, otherwise you have the same leakage problems Done PS10, Line 124: QueryExecStateMap > can't get on one line? Done PS10, Line 127: /// The thread pool that is in charge of destroying a QueryExecState when it is no : /// longer in use. : ThreadPool<QueryExecState*> destroy_thread_; > Do we have evidence that this is necessary, or should we remove this mechan Currently, destruction of the QES is not a very heavyweight operation. It will be in the near future though (after IMPALA-2550). I could leave it as it is, or it could be added in a future patch. I'm okay with either. Line 137: void DestroyQueryExecState(uint32_t thread_id, QueryExecState* query_exec_state); > mention that it's invoked by the thread pool. Done -- To view, visit http://gerrit.cloudera.org:8080/3817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: David Knupp <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
