Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3902: Scheduler improvements for running multiple 
fragment instances on a single backend
......................................................................


Patch Set 3:

(22 comments)

Partial review. I'm posting this because it sounds like you'll be addressing 
the existing changes in the coordinator and I might as well get my related 
comments in before that happens.

I'm still trying to reason about is the correctness of the coordinator-fragment 
handling. That has always been a source of pain, and will likely continue to be 
until we can treat it exactly like any other remote fragment. It's great that 
you started moving in that direction, but it gets a bit hairy to have a 
FragmentExecState for it while still sometimes interacting with the executor_ 
directly. Similarly, the logic around counting fragments in different ways in 
different places is a bit confusing (e.g. 
QuerySchedule::GetNumFragmentInstances() vs 
Coordinator::GetNumRemoteInstaces()), where sometimes we subtract one for the 
coordinator and sometimes not. I know some of this will be cleaned up when we 
remove the non-MT paths, but I was having a hard time reasoning about the 
current code's correctness. I'll keep looking at that after the simplifications 
you said you're making.

http://gerrit.cloudera.org:8080/#/c/4054/3//COMMIT_MSG
Commit Message:

nit: would you mind wrapping the text in commit messages to 80 characters? It 
is presented better in gerrit and other tools.


PS3, Line 19: fragent
fragment


http://gerrit.cloudera.org:8080/#/c/4054/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS3, Line 226:   TUniqueId fragment_instance_id_;
It's getting kind of confusing with all the ids and indexes. It might be 
helpful to group all the ids and indexes together.


PS3, Line 238: id_;
why is this an 'id' but the other monotonically increasing *idx_ fields below 
are 'idx'?


PS3, Line 244: state_idx_
This name isn't very clear. Maybe coord_finstance_state_idx_,  coord_state_idx_ 
or something that makes it more clear what state this refers to.


PS3, Line 408:   if (is_mt_execution && has_coordinator_fragment) --result;
can you add a comment about why this is necessary in the MT case only?


PS3, Line 443: =
nit space


PS3, Line 560: If a coordinator fragment is requested (for most queries this 
will be the case, the
             :   // exception is parallel INSERT queries), 
this is a bit outdated since we know there is a coordinator fragment here, I 
think you can cut out this first part and say "Start the coordinator fragment 
before..."


PS3, Line 662:   VLOG_QUERY << "MtStartRemoteFInstances";
remove?


PS3, Line 664:  
nit extra space


Line 1389:   exec_summary_.__isset.nodes = true;
Maybe clear exec_summary_.nodes here or DCHECK if it's not already empty.


PS3, Line 1411: 
schedule.GetFragmentExecParams(fragment.id).instance_exec_params.size()
can you move this out of the node loop (e.g. int num_frag_instances) so it's 
more clear that this is a per-fragment value?


PS3, Line 1429:   VLOG_QUERY << "MtInitExecProfiles";
remove?


PS3, Line 1465: query_profile_->AddChild(data->averaged_profile, true);
It looks like we previously were inserting the averaged_profiles after the 
previous fragment's averaged_profile, i.e. we specified the optional  
'location' parameter. Is that something we don't need or want to be doing 
anymore?


Line 1502:   VLOG_QUERY << "MtExecRemoteFInstance";
remove?


PS3, Line 1517:       << " instance_id=" << exec_state->fragment_instance_id()
              :       << " 
indent


PS3, Line 2007:           state->impalad_address(), initial_usage);
              :       RuntimeProfile::Counter* mem_usage_counter =
              :           
state->profile()->GetCounter(PlanFragmentExecutor::PER_HOST_PEAK_MEM_COUNTER);
              :       if (mem_usage_counter != NULL && 
mem_usage_counter->value() > *mem_usage) {
              :         per_node_peak_mem_usage[state->impalad_address()] = 
mem_usage_counter->value();
              :       }
If I understand this correctly, the meaning of this counter changes since I 
think we select the peak of the instances. People often use this information 
for capacity planning. Not necessarily an issue but we may need to adjust our 
guidance. (I'm mostly thinking out loud here.)


PS3, Line 2045: VLOG_QUERY
remove?


PS3, Line 2131: //
?


http://gerrit.cloudera.org:8080/#/c/4054/3/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

PS3, Line 346: num_instances
initialize this as well?


PS3, Line 455: const QuerySchedule& schedule,
             :     const FInstanceExecParams& params, int instance_state_idx,
             :     int fragment_instance_idx,
             :     const TNetworkAddress& coord, TExecPlanFragmentParams* 
rpc_params
nit: format these to avoid an extra line or two


http://gerrit.cloudera.org:8080/#/c/4054/3/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

PS3, Line 188:   // 0: multi-threaded execution mode, number of cores is the 
pool default
Not really relevant for this change, but I think we should probably have the 
pool default get applied at the coordinator and pushed out explicitly. This 
could lead to inconsistencies btwn nodes if impalads don't have pool 
configurations in sync or don't all get updated fast enough.


-- 
To view, visit http://gerrit.cloudera.org:8080/4054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-HasComments: Yes

Reply via email to