Lars Volker has posted comments on this change.

Change subject: IMPALA-4014: PARALLEL HEADERS ONLY: Introduce query-wide 
execution state.
......................................................................


Patch Set 1:

(35 comments)

Please see my inline comments. Regarding the runtime-state I can see how it 
would be more cache efficient the way it is right now. However I think it is 
worth cleaning it up, and then addressing any subsequent cache-related 
performance issues by re-grouping and re-ordering members of QueryState.

http://gerrit.cloudera.org:8080/#/c/4301/1/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

Line 38: /// FInstanceState handles all aspects of the execution of a single 
plan
nit: line wrapping looks a bit off in this comment (and maybe elsewhere).


PS1, Line 39: fragment
finstance


PS1, Line 44: fragment
finstance?


PS1, Line 68:   /// It is an error to delete a FInstanceState before 
Open()/GetNext() (depending
            :   /// on whether the fragment has a sink) indicated that 
execution is finished.
This part is unclear to me. Maybe it'll get clearer while reading through the 
code. What if Cancel() is called before Prepare() even?


Line 106:   /// Start execution. Call this prior to GetNext().
How does calling Cancel() affect Open()?


http://gerrit.cloudera.org:8080/#/c/4301/1/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

PS1, Line 79: RegisterFInstance
Should this be renamed to include the fact that it does more than Register the 
instance (i.e. call methods on it)? ExecuteFInstance()? 
ScheduleFinstanceExecution()? RegisterAndExecuteFInstance()?


PS1, Line 81:   /// Cancels a plan fragment that is running asynchronously.
            :   Status CancelPlanFragment(const TUniqueId& finstance_id);
            : 
            :   /// Publishes a global runtime filter to a local fragment.
            :   Status PublishFilter(const TUniqueId& dst_instance_id, int32_t 
filter_id,
            :       const TBloomFilter& thrift_bloom_filter);
these look like forwarding methods. Would it be possible to make 
GetFInstanceState() public and let the caller call them?


PS1, Line 111: controlled
nit: used


PS1, Line 111: QueryExecMgr
mention it's a friend, so it's clear how it works without getters/setters?


http://gerrit.cloudera.org:8080/#/c/4301/1/be/src/runtime/runtime-state-new.h
File be/src/runtime/runtime-state-new.h:

PS1, Line 74: cgroup
should we explain what these parameters do? It's not clear to me at all what 
cgroup does.


PS1, Line 85: fragment
finstance?


PS1, Line 157: one
nit: maybe add 1 to...


Line 164:   /// Returns runtime state profile
I don't think these getter comments are needed here - or we should add them 
everywhere in the class.


Line 174:   /// Returns codegen_ in 'codegen'. If 'initialize' is true, 
codegen_ will be created if
Reword both sentences to "if 'initialize' is true and codegen_ has not been 
initialized, it will be created and returned via 'codegen'." and "If 
'initialize' is false and codegen_ has not been initialized, then 'codegen' 
will be set to NULL."


PS1, Line 200: /// Log an error that will be sent back to the coordinator based 
on an instance of the
             :   /// ErrorMsg class. The runtime state aggregates log messages 
based on type with one
             :   /// exception: messages with the GENERAL type are not 
aggregated but are kept
             :   /// individually.
             :   bool LogError(const ErrorMsg& msg, int vlog_level = 1);
             : 
             :   /// Returns true if the error log has not reached max_errors_.
             :   bool LogHasSpace() {
             :     boost::lock_guard<SpinLock> l(error_log_lock_);
             :     return error_log_.size() < query_options().max_errors;
             :   }
             : 
             :   /// Returns the error log lines as a string joined with '\n'.
             :   std::string ErrorLog();
             : 
             :   /// Copy error_log_ to *errors
             :   void GetErrors(ErrorLogMap* errors);
             : 
             :   /// Append all accumulated errors since the last call to this 
function to new_errors to
             :   /// be sent back to the coordinator
             :   void GetUnreportedErrors(ErrorLogMap* new_errors);
             : 
             :   /// Given an error message, determine whether execution should 
be aborted and, if so,
             :   /// return the corresponding error status. Otherwise, log the 
error and return
             :   /// Status::OK(). Execution is aborted if the ABORT_ON_ERROR 
query option is set to
             :   /// true or the error is not recoverable and should be handled 
upstream.
             :   Status LogOrReturnError(const ErrorMsg& message);
Can these go into an own struct/class?


PS1, Line 231: RuntimeProfile::Counter* total_cpu_timer() { return 
total_cpu_timer_; }
             :   RuntimeProfile::Counter* total_storage_wait_timer() {
             :     return total_storage_wait_timer_;
             :   }
             :   RuntimeProfile::Counter* total_network_send_timer() {
             :     return total_network_send_timer_;
             :   }
             :   RuntimeProfile::Counter* total_network_receive_timer() {
             :     return total_network_receive_timer_;
             :   }
Can the counters go into an own struct/class?


Line 252:   void LogMemLimitExceeded(const MemTracker* tracker, int64_t 
failed_allocation_size);
This could to into the Log struct/class (above)


Line 269:   Status CheckQueryState();
Can this be const? If not a comment should explain why.


Line 279:   Status Init(ExecEnv* exec_env);
Can this be combined with other init methods to simplify initialization?


Line 290:   static const int DEFAULT_BATCH_SIZE = 1024;
comment? What is the unit?


PS1, Line 292:   FInstanceState* finstance_state_;
             :   DescriptorTbl* desc_tbl_;
             :   boost::scoped_ptr<ObjectPool> obj_pool_;
comments?


PS1, Line 300: ErrorLogMap
type and name mismatch, why is this a Map?


PS1, Line 309:   ExecEnv* exec_env_;
             :   boost::scoped_ptr<LlvmCodeGen> codegen_;
Newline above. Comments?


PS1, Line 312:   /// True if this fragment should force codegen for expr 
evaluation.
This comment does not match the one at the setter method (force vs use).


PS1, Line 326:   RuntimeProfile profile_;
             : 
             :   /// Total CPU time (across all threads), including all wait 
times.
             :   RuntimeProfile::Counter* total_cpu_timer_;
             : 
             :   /// Total time waiting in storage (across all threads)
             :   RuntimeProfile::Counter* total_storage_wait_timer_;
             : 
             :   /// Total time spent sending over the network (across all 
threads)
             :   RuntimeProfile::Counter* total_network_send_timer_;
             : 
             :   /// Total time spent receiving over the network (across all 
threads)
             :   RuntimeProfile::Counter* total_network_receive_timer_;
These could go to into an own struct/class.


PS1, Line 341: must be released after the instance_mem_tracker_
How do we guarantee this if an instance of this class is the last one holding 
on to query_mem_tracker_?


Line 360:   /// Reader contexts that need to be closed when the fragment is 
closed.
Who creates them? Who closes them?


PS1, Line 381:   /// prohibit copies
Use DISALLOW_COPY_AND_ASSIGN instead?


http://gerrit.cloudera.org:8080/#/c/4301/1/be/src/service/query-exec-mgr.h
File be/src/service/query-exec-mgr.h:

Line 31: /// Manages execution of queries by creating a QueryState object which 
owns
Mention that exactly one instance of this exists per process? I think that was 
a previous comment by Marcel, too: 
https://gerrit.cloudera.org/#/c/3817/11/be/src/service/query-exec-mgr.h@31


PS1, Line 36: passed
Can we find a better word here? 'passed' seems to imply that it's created 
outside and then just registered.


PS1, Line 95: class object
what does "class object" mean?


Line 105:       query_state_ = 
ExecEnv::GetQueryExecMgr()->GetQueryState(query_id);
Adding DCHECK(ExecEnv::GetQueryExecMgr() != nullptr) might help spot issues 
with incomplete ExecEnv in test setups.


Line 109:       ExecEnv::GetQueryExecMgr()->ReleaseQueryState(query_state_);
Same DCHECK here, plus it might catch errors with the order in which objects 
are destroyed during process shutdown.


PS1, Line 119: exits
nit: is destructed?


PS1, Line 138: /// Calls finstance_state->Prepare() and then 
finstance_state->state->Exec().
             :   /// The finstance_state must previously have been registered.
             :   /// Runs in the fragment instance' execution thread. Calls 
ReleaseQueryState() to
             :   /// decrease the refcount to the QueryState, which was 
reserved during fragment instance
             :   /// registration time.
             :   void ExecInstance(QueryState* query_state, FInstanceState* 
finstance_state);
Should we move the whole method to the QueryState or even FInstanceState? If 
not, can we explain in a comment why not?


-- 
To view, visit http://gerrit.cloudera.org:8080/4301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If58292a5c377660d97e7e7cc0c3122328eba72ed
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: Yes

Reply via email to