>From Ali Alsuliman <[email protected]>:

Attention is currently required from: Vijay Sarathy.
Ali Alsuliman has posted comments on this change. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143 )

Change subject: [ASTERIXDB-3046][COMP] Support cost based query optimization.
......................................................................


Patch Set 6:

(20 comments)

Patchset:

PS6:
here are initial comments from my first pass.


File 
asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/EnumerateJoinsRule.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/f3b91036_396017c2
PS4, Line 133:         if (canTransform.isFalse()) {
we can move this up right after "getJoinOpsAndLeafInputs".


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/08f5befa_e158c899
PS4, Line 137:         if (emptyTupleAndDataSourceOps.size() != 
joinLeafInputsHashMap.size()) {
i imagine this would happen if you visited an empty tuple source twice, somehow.
in any case, we should log this with some useful information.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/498a4e66_e7983920
PS4, Line 551:                 Index index = joinEnum.findSampleIndex(scanOp, 
context);
"findSampleIndex" does not seem to be related to joinEnum. we should refactor 
it out of joinEnum into a util class and declare it "static".


File 
asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/EnumerateJoinsRule.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/6fdd4064_e551dda3
PS6, Line 70: private int totalNumberOfJoins;
            :     private DataSourceScanOperator dataSourceOp;
            :     private EmptyTupleSourceOperator emptyTupleSourceOp;
the way these variables are used suggests that they should be local variables.
making them local variables makes it easy to reason about the class and not 
worry about how these variables should be maintained.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/f493cdc8_e3b1d02e
PS6, Line 74: public EnumerateJoinsRule() {
            :     }
this should be removed.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/403fc27b_b8ba68b2
PS6, Line 77: JoinEnum joinEnum
we should think about whether we need an interface similar to ICost and 
ICostMethods for joinEnum.
Also, we should think about passing something like a factory that creates the 
joinEnum object which could provide a better way for extensions to provide 
their own implementation of joinEnum.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/d7cb4388_b292b135
PS6, Line 113: HashMap<EmptyTupleSourceOperator, ILogicalOperator> 
joinLeafInputsHashMap = new HashMap<>();
             :         // The data scan operators. Will be in the order of the 
from clause.
             :         // Important for position ordering when assigning bits 
to join expressions.
             :         List<Pair<EmptyTupleSourceOperator, 
DataSourceScanOperator>> emptyTupleAndDataSourceOps = new ArrayList<>();
             :         HashMap<DataSourceScanOperator, 
EmptyTupleSourceOperator> dataSourceEmptyTupleHashMap = new HashMap<>();
it feels like we can get rid of some of the data structures here. for example, 
the "dataSourceEmptyTupleHashMap" can be removed. the 
"emptyTupleAndDatSourceOps" could be a list of objects of a new class that we 
create. this new object will store the DataSourceScanOperator and the 
joinLeafInput associated with it (and the empty tuple source, if needed, 
although I don't see a use for it). we can certainly keep an accompanying map 
if we want to look up a joinLeafInput using this new object as a key.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/4e19508c_89f3900d
PS6, Line 152: List<PlanNode> allPlans = joinEnum.getAllPlans();
it seems that joinEnum knows internally where the cheapest plan is. we could 
remove this line and replace allPlans.get(cheapestPlan) with 
joinEnum.getCheapestPlan()


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/362d9515_fd7a68be
PS6, Line 154: joinEnum.enumerateJoins();
it's usually not safe to have methods return an integer that is used like a 
boolean. one can accidentally compare or return wrong values (and I think it 
happened in enumerateBaseLevelJoinNodes() and initializeBaseLevelJoinNodes(); 
check my comment there). we could make enumerateJoins() return true/false to 
indicate whether it was successful or not.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/8c5c7595_a4c5d6fb
PS6, Line 198: containsLeafInputOnly
what would happen if we encounter an operator that has more than one input like 
UNION_ALL? this may not work properly since it assumes (other than joins) that 
it will follow a linear path of operators until hitting the empty tuple source 
& data scan.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/0ed8ef3e_0dde6d35
PS6, Line 224: !(nextOp.getOperatorTag() == LogicalOperatorTag.ASSIGN)
make it:
nextOp.getOperatorTag() != LogicalOperatorTag.ASSIGN


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/ec3b9fd4_0715a220
PS6, Line 227: List<Mutable<ILogicalOperator>> nextOpInputs = 
nextOp.getInputs();
             :         if (nextOpInputs.size() > 1) {
             :             return false;
             :         }
             :         return nextOpInputs.get(0).getValue().getOperatorTag() 
== LogicalOperatorTag.INNERJOIN;
if we get here, then that means it's an ASSIGN operator which will always have 
one input. we can replace all of this with just:
return nextOp.getInputs().get(0).getValue().getOperatorTag() == 
LogicalOperatorTag.INNERJOIN;


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/c71560e5_103d7714
PS6, Line 239: getJoinOpsAndLeafInputs
it's safer to make this method return true/false instead of passing a mutable 
boolean and then having to make sure you maintain it & check it in the right 
places.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/282ba875_cc133c96
PS6, Line 445: true
shouldn't we check that "expr" is an EQ?


File 
asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/util/JoinEnum.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/1bf7a526_a6ffeb5f
PS6, Line 637: int lastBaseLevelJnNum = initializeBaseLevelJoinNodes();
i think there is a problem here. initializeBaseLevelJoinNodes() can return 
JoinNode.NO_CARDS (value -2) which is then compared to PlanNode.NO_PLAN (value 
-1).
we should just make initializeBaseLevelJoinNodes() return true/false (and 
actually the same thing for enumerateBaseLevelJoinNodes()).


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/c585d82f_54ef00c5
PS6, Line 701: if (scanOp == null) {
             :                     continue; // what happens to the cards and 
sizes then? this may happen in case of in lists
             :                 }
i don't think this will ever happen. we already checked that the data scan op 
is not null in the if block.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/9edec367_7ebc17a6
PS6, Line 726: jn.cardinality
the data scan operator itself can have a select condition (and a limit to the 
number of output tuples). should we consider those, as well?


File 
asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/util/JoinNode.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/ff583f93_1960b7b8
PS6, Line 319: joinEnum.getCostMethodsHandle().costFullScan(this);
we could replace this with "opCost" since we already calculated the cost (to 
avoid calculating it again)


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143/comment/38091e05_1c276669
PS6, Line 348: costFullScan(this);
isn't this supposed to be costIndexScan(this);?



--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17143
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I848adf6a8fdcfea360655ab649de2fb75a73c814
Gerrit-Change-Number: 17143
Gerrit-PatchSet: 6
Gerrit-Owner: Vijay Sarathy <[email protected]>
Gerrit-Reviewer: Ali Alsuliman <[email protected]>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Glenn Galvizo <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Till Westmann <[email protected]>
Gerrit-Reviewer: Wail Alkowaileet <[email protected]>
Gerrit-Reviewer: [email protected]
Gerrit-Attention: Vijay Sarathy <[email protected]>
Gerrit-Comment-Date: Tue, 06 Sep 2022 00:18:46 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Gerrit-MessageType: comment

Reply via email to