Dmitry Lychagin has posted comments on this change. Change subject: [ASTERIXDB-2286][COMP][FUN][HYR] Parallel Sort Optimization ......................................................................
Patch Set 13: (13 comments) done. please take a look and let's discuss at some point. https://asterix-gerrit.ics.uci.edu/#/c/2393/13/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/CheckFullParallelSortRule.java File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/CheckFullParallelSortRule.java: Line 89: if (clusterNodeDomain.cardinality() == null || clusterNodeDomain.cardinality() <= 1) { can these checks be done in AbstractStableSortPOperator.getRequiredPropertiesForChildren()? Why should treat USE_DYNAMIC_RANGE as a hint there and only require OrderedPartitionedProperty there if this hint is set and these conditions are satisfied. Line 102: if (parent.getOperatorTag() == LogicalOperatorTag.LIMIT && ((LimitOperator) parent).isTopmostLimitOp()) { These checks seem to be requirements that top-most limit and running aggreagte operators should place on their children properties. So top-most limit should require UNPARTITIONED input. The we could check parent requirements in AbstractStableSortPOperator.getRequiredPropertiesForChildren() and decide whether to introduce full parallel sort there or not. This approach is more generic because it propagates requirements through the plan and other operators could add their own requirements. Can we do all this and eliminate this rule? https://asterix-gerrit.ics.uci.edu/#/c/2393/13/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/subplan/InlineAllNtsInSubplanVisitor.java File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/subplan/InlineAllNtsInSubplanVisitor.java: Line 648: // TODO(ali): check this let's just throw a compilation exception here. same as visitLeftOuterUnnestMapOperator() does. this visitor should not be called after the property enforcement rule. https://asterix-gerrit.ics.uci.edu/#/c/2393/13/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/subplan/SubplanSpecialFlatteningCheckVisitor.java File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/subplan/SubplanSpecialFlatteningCheckVisitor.java: Line 235: return visitInputs(op); throw an error here? there should not be any forward operators in the plan at this point https://asterix-gerrit.ics.uci.edu/#/c/2393/13/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/translator/LangExpressionToPlanTranslator.java File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/translator/LangExpressionToPlanTranslator.java: Line 1174: // TODO(ali): would it be better to use a variable instead of annotation? that way whenever someone creates here's an idea that we should discuss. What if we move no-full-parallel hint to the statement level, like we have "noindexonly" (AbstractIntroduceAccessMethodRule.NO_INDEX_ONLY_PLAN_OPTION). So by default we consider all orderbys to be full parallel with dynamic range (or static range if the range was provided as a hint on the order by clause). Just because we consider them full-parallel does not mean that they will become full parallel because they'll be subject to additional checks that are performed by CheckFullParallelSortRule. Also users can override this behavior by setting "nofullparallelsort" hint at the statement level. In that case we'll never consider them full parallel so they'll do global merge as before this feature. It's unlikely that users will need to control fullparallel per each orderbby clause, so per-statement setting should be sufficient. With this approach we don't need "unpartitioned" annotation in the parser, OrderbyClause.isFullParallel flag, and USE_DYNAMI C_RANGE annotation. We'll only need USE_RANGE_CONNECTOR annotation (that I suggested to rename as USE_STATIC_RANGE) and "nofullparallelsort" hint at the query level (need to decide on the name). What do you think? https://asterix-gerrit.ics.uci.edu/#/c/2393/13/asterixdb/asterix-app/src/main/java/org/apache/asterix/app/resource/RequiredCapacityVisitor.java File asterixdb/asterix-app/src/main/java/org/apache/asterix/app/resource/RequiredCapacityVisitor.java: Line 310: visitInternal(op, true); Forward is a blocking operator, should we call calculateMemoryUsageForBlockingOperators() here? https://asterix-gerrit.ics.uci.edu/#/c/2393/13/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/GlobalSamplingAggregateDescriptor.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/GlobalSamplingAggregateDescriptor.java: Line 173: ByteArrayInputStream byteStream = new ByteArrayInputStream(localSamples.getByteArray(), > ByteArrayAccessibleInputStream is not available at asterix-runtime. It's in let's move ByteArrayAccessibleInputStream to org.apache.hyracks.data.std.util https://asterix-gerrit.ics.uci.edu/#/c/2393/13/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/LocalSamplingAggregateDescriptor.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/LocalSamplingAggregateDescriptor.java: Line 77: private static final int NUM_SAMPLES = 100; have we thought about making it configurable (probably in a separate change)? Line 113: System.arraycopy(inputFieldValue.getByteArray(), inputFieldValue.getStartOffset(), fieldValue, 0, can we just produce the output right here at each step (using IAsterixListBuilder) instead of copying the data into java List and serializing in finish()? https://asterix-gerrit.ics.uci.edu/#/c/2393/13/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/logical/ForwardOperator.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/logical/ForwardOperator.java: Line 45: private LogicalVariable rangeMapVariable; minor, just for consistency. An operator assigns variables and consumes expressions. So instead of rangeMapVariable field it should be Mutable<ILogicalExpression> rangeMapExpr. And instead of passing a variable in the constructor we should pass VariableReferenceExpression(rangeMapVar) and modify acceptExpressionTransform() accordingly https://asterix-gerrit.ics.uci.edu/#/c/2393/13/hyracks-fullstack/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules/EnforceStructuralPropertiesRule.java File hyracks-fullstack/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules/EnforceStructuralPropertiesRule.java: Line 611: if (parentOp.getAnnotations().containsKey(OperatorAnnotations.USE_RANGE_CONNECTOR)) { > I didn't want to change the old code which handles the static range map sin sure. let's discuss. I think it makes sense to unify them at this point. if that implementation is not correct then it can be fixed separately. Line 666: // options for range partitioning: 1. range map computed at run time, 2. no range map?? > See my other comment related to this in createMergingConnector. Also, I'm n right, I also doubt that it was ever executed. but if we have a static map at this point then we can just pass it to the RangePartitionExchangePOperator https://asterix-gerrit.ics.uci.edu/#/c/2393/13/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/collectors/SequentialMergeFrameReader.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/collectors/SequentialMergeFrameReader.java: Line 69: sender.close(); what if one sender's close() fails. should we continue closing the remaining ones? -- To view, visit https://asterix-gerrit.ics.uci.edu/2393 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I73e128029a46f45e6b68c23dfb9310d5de10582f Gerrit-PatchSet: 13 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Dmitry Lychagin <[email protected]> Gerrit-Reviewer: Ildar Absalyamov <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Taewoo Kim <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: abdullah alamoudi <[email protected]> Gerrit-HasComments: Yes
