Ali Alsuliman has posted comments on this change. Change subject: [ASTERIXDB-2286][COMP][FUN][HYR] Parallel Sort Optimization ......................................................................
Patch Set 25: (14 comments) https://asterix-gerrit.ics.uci.edu/#/c/2393/25/asterixdb/asterix-app/src/test/resources/optimizerts/queries/hints/fullparallelsort.sqlpp File asterixdb/asterix-app/src/test/resources/optimizerts/queries/hints/fullparallelsort.sqlpp: Line 21: * Description : Test no full parallel hint to disable full parallel sort. > the description is incorrect. I changed the test cases and put them in parallel_sort_enabled_disabled. https://asterix-gerrit.ics.uci.edu/#/c/2393/25/asterixdb/asterix-app/src/test/resources/optimizerts/results/introhashpartitionmerge.plan File asterixdb/asterix-app/src/test/resources/optimizerts/results/introhashpartitionmerge.plan: Line 16 > I think we need to turn off parallel sort for this query because the intent Sure. https://asterix-gerrit.ics.uci.edu/#/c/2393/25/asterixdb/asterix-lang-aql/pom.xml File asterixdb/asterix-lang-aql/pom.xml: Line 182: <version>0.3.4-SNAPSHOT</version> > remove hardcoded version from here Done https://asterix-gerrit.ics.uci.edu/#/c/2393/25/asterixdb/asterix-lang-common/pom.xml File asterixdb/asterix-lang-common/pom.xml: Line 109: <version>0.3.4-SNAPSHOT</version> > remove hard-coded version Done https://asterix-gerrit.ics.uci.edu/#/c/2393/25/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/LocalSamplingAggregateDescriptor.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/LocalSamplingAggregateDescriptor.java: Line 73: private static final int NUM_SAMPLES = 100; > we should make this configurable by users (in a separate change) Done now that we decided to allow enabling/disabling full parallel sort option through configuration. Line 121: oneSampleBuilder.addItem(inputFieldValue); > Should we check here that the inputFieldValue is of a primitive type? Other That's right. The list item type is ANY, and following with the practice we should open derived types. But it will raise another issue here. The data source could be of a closed-type. I open the records here and form a range map. Then, I get the records from the data source as closed-type and compare them with open-types when redistributing the records. I don't know how the comparison is going to work here. Currently, with the way we set up the pipeline, we shouldn't have a problem whether the records are closed-type or open-type since we are dealing with them as binary (we are using AObjectAscBinaryComparatorFactory). We should probably discuss it later and explore alternatives. https://asterix-gerrit.ics.uci.edu/#/c/2393/25/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/logical/visitors/SubstituteVariableVisitor.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/logical/visitors/SubstituteVariableVisitor.java: Line 466: op.getRangeMapExpression().getValue().substituteVar(arg.first, arg.second); > need to call substVarTypes(op, arg); after this line Done. I suppose some other operators need to be fixed as well then? Not sure why some operators call subVarTypes() and others don't. https://asterix-gerrit.ics.uci.edu/#/c/2393/25/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/ForwardPOperator.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/ForwardPOperator.java: Line 122: // TODO(ali): check this > seems to be fine. let's remove this comment Done https://asterix-gerrit.ics.uci.edu/#/c/2393/25/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/RangePartitionExchangePOperator.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/RangePartitionExchangePOperator.java: Line 136: final String splitCount = rangeMap == null ? "" : Integer.toString(rangeMap.getSplitCount()); > minor. let's no print "SPLIT COUNT: " if range map is dynamic so there's no Done https://asterix-gerrit.ics.uci.edu/#/c/2393/25/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/prettyprint/LogicalOperatorPrettyPrintVisitor.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/prettyprint/LogicalOperatorPrettyPrintVisitor.java: Line 470: addIndent(indent).append("forward"); > print rangemap key and expression? Done for the expression. rangemap key is the generated UUID. The plan wouldn't look nice showing the UUID. But if you insist on including it, I will add it :) https://asterix-gerrit.ics.uci.edu/#/c/2393/25/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/prettyprint/LogicalOperatorPrettyPrintVisitorJson.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/prettyprint/LogicalOperatorPrettyPrintVisitorJson.java: Line 652: addIndent(indent).append("\"operator\": \"forward\""); > print rangemap key and expression? Done for the expression. rangemap key is the generated UUID. The plan wouldn't look nice showing the UUID. But if you insist on including it, I will add it :) https://asterix-gerrit.ics.uci.edu/#/c/2393/25/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/collectors/SequentialMergeFrameReader.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/collectors/SequentialMergeFrameReader.java: Line 41: partitionBatchManager.getNextBatch(senders, numSenders); > is there a guarantee that the senders will be returned in the order you nee I have to fix this. You're right. The order is not guaranteed. Check the new batch manager. I also included a test case for this. https://asterix-gerrit.ics.uci.edu/#/c/2393/25/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/misc/ForwardOperatorDescriptor.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/misc/ForwardOperatorDescriptor.java: Line 222: TaskUtil.put(rangeMapKeyInContext, rangeMapState.rangeMap, ctx); > Seems like we can just call TaskUtil.put() in the RangeMapReaderActivityNod I believe that the reader activity's ctx is different from forward's ctx. They are in different ctx because there is a blocking connector between them. So, each activity runs in a separate thread. I see that TaskUtil.put() can only share information between activities in the same ctx. AbstractSorterOperatorDescriptor uses the same idea of a state object to share info between activities in different ctx. So, I guess that's the only option here for me as well. https://asterix-gerrit.ics.uci.edu/#/c/2393/25/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/sort/AbstractExternalSortRunMerger.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/sort/AbstractExternalSortRunMerger.java: Line 161: LOGGER.info("number of passes: " + numberOfPasses); > minor. let's move it to the debug level and merge with the previous line. Done -- To view, visit https://asterix-gerrit.ics.uci.edu/2393 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I73e128029a46f45e6b68c23dfb9310d5de10582f Gerrit-PatchSet: 25 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Dmitry Lychagin <[email protected]> Gerrit-Reviewer: Ildar Absalyamov <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Taewoo Kim <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-HasComments: Yes
