>From Ali Alsuliman <[email protected]>: Attention is currently required from: Vijay Sarathy. Ali Alsuliman has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143 )
Change subject: [ASTERIXDB-3046][COMP] Support cost based query optimization. ...................................................................... Patch Set 6: (20 comments) Patchset: PS6: here are initial comments from my first pass. File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/EnumerateJoinsRule.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/f3b91036_396017c2 PS4, Line 133: if (canTransform.isFalse()) { we can move this up right after "getJoinOpsAndLeafInputs". https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/08f5befa_e158c899 PS4, Line 137: if (emptyTupleAndDataSourceOps.size() != joinLeafInputsHashMap.size()) { i imagine this would happen if you visited an empty tuple source twice, somehow. in any case, we should log this with some useful information. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/498a4e66_e7983920 PS4, Line 551: Index index = joinEnum.findSampleIndex(scanOp, context); "findSampleIndex" does not seem to be related to joinEnum. we should refactor it out of joinEnum into a util class and declare it "static". File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/EnumerateJoinsRule.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/6fdd4064_e551dda3 PS6, Line 70: private int totalNumberOfJoins; : private DataSourceScanOperator dataSourceOp; : private EmptyTupleSourceOperator emptyTupleSourceOp; the way these variables are used suggests that they should be local variables. making them local variables makes it easy to reason about the class and not worry about how these variables should be maintained. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/f493cdc8_e3b1d02e PS6, Line 74: public EnumerateJoinsRule() { : } this should be removed. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/403fc27b_b8ba68b2 PS6, Line 77: JoinEnum joinEnum we should think about whether we need an interface similar to ICost and ICostMethods for joinEnum. Also, we should think about passing something like a factory that creates the joinEnum object which could provide a better way for extensions to provide their own implementation of joinEnum. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/d7cb4388_b292b135 PS6, Line 113: HashMap<EmptyTupleSourceOperator, ILogicalOperator> joinLeafInputsHashMap = new HashMap<>(); : // The data scan operators. Will be in the order of the from clause. : // Important for position ordering when assigning bits to join expressions. : List<Pair<EmptyTupleSourceOperator, DataSourceScanOperator>> emptyTupleAndDataSourceOps = new ArrayList<>(); : HashMap<DataSourceScanOperator, EmptyTupleSourceOperator> dataSourceEmptyTupleHashMap = new HashMap<>(); it feels like we can get rid of some of the data structures here. for example, the "dataSourceEmptyTupleHashMap" can be removed. the "emptyTupleAndDatSourceOps" could be a list of objects of a new class that we create. this new object will store the DataSourceScanOperator and the joinLeafInput associated with it (and the empty tuple source, if needed, although I don't see a use for it). we can certainly keep an accompanying map if we want to look up a joinLeafInput using this new object as a key. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/4e19508c_89f3900d PS6, Line 152: List<PlanNode> allPlans = joinEnum.getAllPlans(); it seems that joinEnum knows internally where the cheapest plan is. we could remove this line and replace allPlans.get(cheapestPlan) with joinEnum.getCheapestPlan() https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/362d9515_fd7a68be PS6, Line 154: joinEnum.enumerateJoins(); it's usually not safe to have methods return an integer that is used like a boolean. one can accidentally compare or return wrong values (and I think it happened in enumerateBaseLevelJoinNodes() and initializeBaseLevelJoinNodes(); check my comment there). we could make enumerateJoins() return true/false to indicate whether it was successful or not. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/8c5c7595_a4c5d6fb PS6, Line 198: containsLeafInputOnly what would happen if we encounter an operator that has more than one input like UNION_ALL? this may not work properly since it assumes (other than joins) that it will follow a linear path of operators until hitting the empty tuple source & data scan. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/0ed8ef3e_0dde6d35 PS6, Line 224: !(nextOp.getOperatorTag() == LogicalOperatorTag.ASSIGN) make it: nextOp.getOperatorTag() != LogicalOperatorTag.ASSIGN https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/ec3b9fd4_0715a220 PS6, Line 227: List<Mutable<ILogicalOperator>> nextOpInputs = nextOp.getInputs(); : if (nextOpInputs.size() > 1) { : return false; : } : return nextOpInputs.get(0).getValue().getOperatorTag() == LogicalOperatorTag.INNERJOIN; if we get here, then that means it's an ASSIGN operator which will always have one input. we can replace all of this with just: return nextOp.getInputs().get(0).getValue().getOperatorTag() == LogicalOperatorTag.INNERJOIN; https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/c71560e5_103d7714 PS6, Line 239: getJoinOpsAndLeafInputs it's safer to make this method return true/false instead of passing a mutable boolean and then having to make sure you maintain it & check it in the right places. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/282ba875_cc133c96 PS6, Line 445: true shouldn't we check that "expr" is an EQ? File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/util/JoinEnum.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/1bf7a526_a6ffeb5f PS6, Line 637: int lastBaseLevelJnNum = initializeBaseLevelJoinNodes(); i think there is a problem here. initializeBaseLevelJoinNodes() can return JoinNode.NO_CARDS (value -2) which is then compared to PlanNode.NO_PLAN (value -1). we should just make initializeBaseLevelJoinNodes() return true/false (and actually the same thing for enumerateBaseLevelJoinNodes()). https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/c585d82f_54ef00c5 PS6, Line 701: if (scanOp == null) { : continue; // what happens to the cards and sizes then? this may happen in case of in lists : } i don't think this will ever happen. we already checked that the data scan op is not null in the if block. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/9edec367_7ebc17a6 PS6, Line 726: jn.cardinality the data scan operator itself can have a select condition (and a limit to the number of output tuples). should we consider those, as well? File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/util/JoinNode.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/ff583f93_1960b7b8 PS6, Line 319: joinEnum.getCostMethodsHandle().costFullScan(this); we could replace this with "opCost" since we already calculated the cost (to avoid calculating it again) https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/38091e05_1c276669 PS6, Line 348: costFullScan(this); isn't this supposed to be costIndexScan(this);? -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143 To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Change-Id: I848adf6a8fdcfea360655ab649de2fb75a73c814 Gerrit-Change-Number: 17143 Gerrit-PatchSet: 6 Gerrit-Owner: Vijay Sarathy <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Glenn Galvizo <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: Wail Alkowaileet <[email protected]> Gerrit-Reviewer: [email protected] Gerrit-Attention: Vijay Sarathy <[email protected]> Gerrit-Comment-Date: Tue, 06 Sep 2022 00:18:46 +0000 Gerrit-HasComments: Yes Gerrit-Has-Labels: No Gerrit-MessageType: comment
