Alex Behm has posted comments on this change. Change subject: MT: Planner for multi-threaded execution ......................................................................
Patch Set 1: (26 comments) http://gerrit.cloudera.org:8080/#/c/2846/1//COMMIT_MSG Commit Message: Line 18: section yet (which will happen at a later date, to cover more corner cases). Mention that the changes are intentionally additive so as to not break the existing functionality, and that eventually the parallel planning might be merged into single and distributed plan creation. http://gerrit.cloudera.org:8080/#/c/2846/1/be/src/service/query-options.cc File be/src/service/query-options.cc: Line 391: case TImpalaQueryOptions::MT_NUM_CORES: { How about NUM_THREADS? Reasons: * more accurate (no guarantee that you'll actually get those many real cores) * more consistent with NUM_NODES * avoids potential confusion with V_CPU_CORES http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/common/TreeNode.java File fe/src/main/java/com/cloudera/impala/common/TreeNode.java: Line 27: protected TreeNode() {} if this is to prevent public construction, add comment Line 41: public void removeChildren() { children_.clear(); } clearChildren() seems clearer http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/CohortId.java File fe/src/main/java/com/cloudera/impala/planner/CohortId.java: Line 1: // Copyright 2012 Cloudera Inc. 2016 and elsewhere http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java: Line 617: unionNode.removeChildren(); can be moved above this block (needs to be done in any case) Line 622: // TODO: provide a more rigid interface for this, this feels error-prone. are we really going to address this TODO? if it's easy enough might as well do it now, or alternatively remove it http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/HashJoinNode.java File fe/src/main/java/com/cloudera/impala/planner/HashJoinNode.java: Line 59: public HashTableId getHashTableId() { return hashTableId_; } Use JoinBuildId and move to JoinNode? Line 146: if (hashTableId_.isValid()) { I'd prefer to only add this in VERBOSE and EXTENDED http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/HashTableBuildNode.java File fe/src/main/java/com/cloudera/impala/planner/HashTableBuildNode.java: Line 38: public class HashTableBuildNode extends PlanNode { Why is this a PlanNode and not a DataSink? Seems more logical since a join build does not have an "output" in the GetNext() sense. Might as well call this JoinBuildSink. Line 45: * Moves the build input of joinNode to this node (and connects this node remove parenthesis, that part seems important Line 69: super.init(analyzer); set cardinality and hosts, Preconditions check that there are no conjuncts http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/JoinNode.java File fe/src/main/java/com/cloudera/impala/planner/JoinNode.java: Line 64: // TODO: change to a more generic abstraction, NLJs don't have hash tables This TODO sounds easy enough to address now (just rename the new stuff) http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/ParallelPlanner.java File fe/src/main/java/com/cloudera/impala/planner/ParallelPlanner.java: Line 34: * The plans that produce intermediate hash tables for a particular plan P are grouped no need for the "P" (it's not used anywhere else) Line 51: * Given a distributed plan (expressed as the root of a tree of plan fragments), suggest removing the part in parenthesis and renaming the "plan" param to "root" Line 73: LOG.info("createbuildplans fragment " + fragment.getId().toString()); let's not forget to eventually remove some of this logging Line 101: if (node instanceof JoinNode) { we should also only traverse the left child of a SubplanNode http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/PlanFragment.java File fe/src/main/java/com/cloudera/impala/planner/PlanFragment.java: Line 78: private CohortId cohortId_; needs comment or explanation in class comment Line 111: LOG.info("new fragment " + id.toString()); remove http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/Planner.java File fe/src/main/java/com/cloudera/impala/planner/Planner.java: Line 172: return planner.createPlans(distrPlan.get(0)); squeeze this in here: ctx_.getRootAnalyzer().getTimeline().markEvent("Parallel plan created") http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java File fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java: Line 402: testPlan(testCase, Section.DISTRIBUTEDPLAN, queryCtx, errorLog, actualOutput); extra space Line 404: if (!testCase.getSectionContents(Section.PARALLELPLANS).isEmpty()) { The getSectionContents() check is already performed in testPlan() below. Not checking that here has the advantage that we'll run all tests through the ParallelPlanner even if we don't have an expected test section for it. http://gerrit.cloudera.org:8080/#/c/2846/1/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test: Line 334: | | plan id: 01 I'm not sure the plan id is really necessary. I'm ok leaving it but I'd suggest condensing the explain output to something like: plan-id=01 build-id=00 cohort-id=01 I also think we should print the plan-id for all nodes to that it becomes clearer what it actually is. Line 351: | |--32:BUILD [01] How about JOIN BUILD? That seems meaningful and generic enough to also cover the nested loop join case. Line 356: | | | tuple-ids=4 row-size=0B cardinality=unavailable even though it's not needed, it would be nice to preserve the cardinality and per-host-mem Line 5105: | | | build expressions: whitespace -- To view, visit http://gerrit.cloudera.org:8080/2846 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c34dd3f9190a131e6f03d901b4bfcd164a5174 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-HasComments: Yes
