Lars Volker has posted comments on this change. Change subject: IMPALA-4014: PARALLEL HEADERS ONLY: Introduce query-wide execution state. ......................................................................
Patch Set 1: (35 comments) Please see my inline comments. Regarding the runtime-state I can see how it would be more cache efficient the way it is right now. However I think it is worth cleaning it up, and then addressing any subsequent cache-related performance issues by re-grouping and re-ordering members of QueryState. http://gerrit.cloudera.org:8080/#/c/4301/1/be/src/runtime/fragment-instance-state.h File be/src/runtime/fragment-instance-state.h: Line 38: /// FInstanceState handles all aspects of the execution of a single plan nit: line wrapping looks a bit off in this comment (and maybe elsewhere). PS1, Line 39: fragment finstance PS1, Line 44: fragment finstance? PS1, Line 68: /// It is an error to delete a FInstanceState before Open()/GetNext() (depending : /// on whether the fragment has a sink) indicated that execution is finished. This part is unclear to me. Maybe it'll get clearer while reading through the code. What if Cancel() is called before Prepare() even? Line 106: /// Start execution. Call this prior to GetNext(). How does calling Cancel() affect Open()? http://gerrit.cloudera.org:8080/#/c/4301/1/be/src/runtime/query-state.h File be/src/runtime/query-state.h: PS1, Line 79: RegisterFInstance Should this be renamed to include the fact that it does more than Register the instance (i.e. call methods on it)? ExecuteFInstance()? ScheduleFinstanceExecution()? RegisterAndExecuteFInstance()? PS1, Line 81: /// Cancels a plan fragment that is running asynchronously. : Status CancelPlanFragment(const TUniqueId& finstance_id); : : /// Publishes a global runtime filter to a local fragment. : Status PublishFilter(const TUniqueId& dst_instance_id, int32_t filter_id, : const TBloomFilter& thrift_bloom_filter); these look like forwarding methods. Would it be possible to make GetFInstanceState() public and let the caller call them? PS1, Line 111: controlled nit: used PS1, Line 111: QueryExecMgr mention it's a friend, so it's clear how it works without getters/setters? http://gerrit.cloudera.org:8080/#/c/4301/1/be/src/runtime/runtime-state-new.h File be/src/runtime/runtime-state-new.h: PS1, Line 74: cgroup should we explain what these parameters do? It's not clear to me at all what cgroup does. PS1, Line 85: fragment finstance? PS1, Line 157: one nit: maybe add 1 to... Line 164: /// Returns runtime state profile I don't think these getter comments are needed here - or we should add them everywhere in the class. Line 174: /// Returns codegen_ in 'codegen'. If 'initialize' is true, codegen_ will be created if Reword both sentences to "if 'initialize' is true and codegen_ has not been initialized, it will be created and returned via 'codegen'." and "If 'initialize' is false and codegen_ has not been initialized, then 'codegen' will be set to NULL." PS1, Line 200: /// Log an error that will be sent back to the coordinator based on an instance of the : /// ErrorMsg class. The runtime state aggregates log messages based on type with one : /// exception: messages with the GENERAL type are not aggregated but are kept : /// individually. : bool LogError(const ErrorMsg& msg, int vlog_level = 1); : : /// Returns true if the error log has not reached max_errors_. : bool LogHasSpace() { : boost::lock_guard<SpinLock> l(error_log_lock_); : return error_log_.size() < query_options().max_errors; : } : : /// Returns the error log lines as a string joined with '\n'. : std::string ErrorLog(); : : /// Copy error_log_ to *errors : void GetErrors(ErrorLogMap* errors); : : /// Append all accumulated errors since the last call to this function to new_errors to : /// be sent back to the coordinator : void GetUnreportedErrors(ErrorLogMap* new_errors); : : /// Given an error message, determine whether execution should be aborted and, if so, : /// return the corresponding error status. Otherwise, log the error and return : /// Status::OK(). Execution is aborted if the ABORT_ON_ERROR query option is set to : /// true or the error is not recoverable and should be handled upstream. : Status LogOrReturnError(const ErrorMsg& message); Can these go into an own struct/class? PS1, Line 231: RuntimeProfile::Counter* total_cpu_timer() { return total_cpu_timer_; } : RuntimeProfile::Counter* total_storage_wait_timer() { : return total_storage_wait_timer_; : } : RuntimeProfile::Counter* total_network_send_timer() { : return total_network_send_timer_; : } : RuntimeProfile::Counter* total_network_receive_timer() { : return total_network_receive_timer_; : } Can the counters go into an own struct/class? Line 252: void LogMemLimitExceeded(const MemTracker* tracker, int64_t failed_allocation_size); This could to into the Log struct/class (above) Line 269: Status CheckQueryState(); Can this be const? If not a comment should explain why. Line 279: Status Init(ExecEnv* exec_env); Can this be combined with other init methods to simplify initialization? Line 290: static const int DEFAULT_BATCH_SIZE = 1024; comment? What is the unit? PS1, Line 292: FInstanceState* finstance_state_; : DescriptorTbl* desc_tbl_; : boost::scoped_ptr<ObjectPool> obj_pool_; comments? PS1, Line 300: ErrorLogMap type and name mismatch, why is this a Map? PS1, Line 309: ExecEnv* exec_env_; : boost::scoped_ptr<LlvmCodeGen> codegen_; Newline above. Comments? PS1, Line 312: /// True if this fragment should force codegen for expr evaluation. This comment does not match the one at the setter method (force vs use). PS1, Line 326: RuntimeProfile profile_; : : /// Total CPU time (across all threads), including all wait times. : RuntimeProfile::Counter* total_cpu_timer_; : : /// Total time waiting in storage (across all threads) : RuntimeProfile::Counter* total_storage_wait_timer_; : : /// Total time spent sending over the network (across all threads) : RuntimeProfile::Counter* total_network_send_timer_; : : /// Total time spent receiving over the network (across all threads) : RuntimeProfile::Counter* total_network_receive_timer_; These could go to into an own struct/class. PS1, Line 341: must be released after the instance_mem_tracker_ How do we guarantee this if an instance of this class is the last one holding on to query_mem_tracker_? Line 360: /// Reader contexts that need to be closed when the fragment is closed. Who creates them? Who closes them? PS1, Line 381: /// prohibit copies Use DISALLOW_COPY_AND_ASSIGN instead? http://gerrit.cloudera.org:8080/#/c/4301/1/be/src/service/query-exec-mgr.h File be/src/service/query-exec-mgr.h: Line 31: /// Manages execution of queries by creating a QueryState object which owns Mention that exactly one instance of this exists per process? I think that was a previous comment by Marcel, too: https://gerrit.cloudera.org/#/c/3817/11/be/src/service/query-exec-mgr.h@31 PS1, Line 36: passed Can we find a better word here? 'passed' seems to imply that it's created outside and then just registered. PS1, Line 95: class object what does "class object" mean? Line 105: query_state_ = ExecEnv::GetQueryExecMgr()->GetQueryState(query_id); Adding DCHECK(ExecEnv::GetQueryExecMgr() != nullptr) might help spot issues with incomplete ExecEnv in test setups. Line 109: ExecEnv::GetQueryExecMgr()->ReleaseQueryState(query_state_); Same DCHECK here, plus it might catch errors with the order in which objects are destroyed during process shutdown. PS1, Line 119: exits nit: is destructed? PS1, Line 138: /// Calls finstance_state->Prepare() and then finstance_state->state->Exec(). : /// The finstance_state must previously have been registered. : /// Runs in the fragment instance' execution thread. Calls ReleaseQueryState() to : /// decrease the refcount to the QueryState, which was reserved during fragment instance : /// registration time. : void ExecInstance(QueryState* query_state, FInstanceState* finstance_state); Should we move the whole method to the QueryState or even FInstanceState? If not, can we explain in a comment why not? -- To view, visit http://gerrit.cloudera.org:8080/4301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If58292a5c377660d97e7e7cc0c3122328eba72ed Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
