Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend ......................................................................
Patch Set 3: (22 comments) Partial review. I'm posting this because it sounds like you'll be addressing the existing changes in the coordinator and I might as well get my related comments in before that happens. I'm still trying to reason about is the correctness of the coordinator-fragment handling. That has always been a source of pain, and will likely continue to be until we can treat it exactly like any other remote fragment. It's great that you started moving in that direction, but it gets a bit hairy to have a FragmentExecState for it while still sometimes interacting with the executor_ directly. Similarly, the logic around counting fragments in different ways in different places is a bit confusing (e.g. QuerySchedule::GetNumFragmentInstances() vs Coordinator::GetNumRemoteInstaces()), where sometimes we subtract one for the coordinator and sometimes not. I know some of this will be cleaned up when we remove the non-MT paths, but I was having a hard time reasoning about the current code's correctness. I'll keep looking at that after the simplifications you said you're making. http://gerrit.cloudera.org:8080/#/c/4054/3//COMMIT_MSG Commit Message: nit: would you mind wrapping the text in commit messages to 80 characters? It is presented better in gerrit and other tools. PS3, Line 19: fragent fragment http://gerrit.cloudera.org:8080/#/c/4054/3/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS3, Line 226: TUniqueId fragment_instance_id_; It's getting kind of confusing with all the ids and indexes. It might be helpful to group all the ids and indexes together. PS3, Line 238: id_; why is this an 'id' but the other monotonically increasing *idx_ fields below are 'idx'? PS3, Line 244: state_idx_ This name isn't very clear. Maybe coord_finstance_state_idx_, coord_state_idx_ or something that makes it more clear what state this refers to. PS3, Line 408: if (is_mt_execution && has_coordinator_fragment) --result; can you add a comment about why this is necessary in the MT case only? PS3, Line 443: = nit space PS3, Line 560: If a coordinator fragment is requested (for most queries this will be the case, the : // exception is parallel INSERT queries), this is a bit outdated since we know there is a coordinator fragment here, I think you can cut out this first part and say "Start the coordinator fragment before..." PS3, Line 662: VLOG_QUERY << "MtStartRemoteFInstances"; remove? PS3, Line 664: nit extra space Line 1389: exec_summary_.__isset.nodes = true; Maybe clear exec_summary_.nodes here or DCHECK if it's not already empty. PS3, Line 1411: schedule.GetFragmentExecParams(fragment.id).instance_exec_params.size() can you move this out of the node loop (e.g. int num_frag_instances) so it's more clear that this is a per-fragment value? PS3, Line 1429: VLOG_QUERY << "MtInitExecProfiles"; remove? PS3, Line 1465: query_profile_->AddChild(data->averaged_profile, true); It looks like we previously were inserting the averaged_profiles after the previous fragment's averaged_profile, i.e. we specified the optional 'location' parameter. Is that something we don't need or want to be doing anymore? Line 1502: VLOG_QUERY << "MtExecRemoteFInstance"; remove? PS3, Line 1517: << " instance_id=" << exec_state->fragment_instance_id() : << " indent PS3, Line 2007: state->impalad_address(), initial_usage); : RuntimeProfile::Counter* mem_usage_counter = : state->profile()->GetCounter(PlanFragmentExecutor::PER_HOST_PEAK_MEM_COUNTER); : if (mem_usage_counter != NULL && mem_usage_counter->value() > *mem_usage) { : per_node_peak_mem_usage[state->impalad_address()] = mem_usage_counter->value(); : } If I understand this correctly, the meaning of this counter changes since I think we select the peak of the instances. People often use this information for capacity planning. Not necessarily an issue but we may need to adjust our guidance. (I'm mostly thinking out loud here.) PS3, Line 2045: VLOG_QUERY remove? PS3, Line 2131: // ? http://gerrit.cloudera.org:8080/#/c/4054/3/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: PS3, Line 346: num_instances initialize this as well? PS3, Line 455: const QuerySchedule& schedule, : const FInstanceExecParams& params, int instance_state_idx, : int fragment_instance_idx, : const TNetworkAddress& coord, TExecPlanFragmentParams* rpc_params nit: format these to avoid an extra line or two http://gerrit.cloudera.org:8080/#/c/4054/3/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: PS3, Line 188: // 0: multi-threaded execution mode, number of cores is the pool default Not really relevant for this change, but I think we should probably have the pool default get applied at the coordinator and pushed out explicitly. This could lead to inconsistencies btwn nodes if impalads don't have pool configurations in sync or don't all get updated fast enough. -- To view, visit http://gerrit.cloudera.org:8080/4054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-HasComments: Yes
