Wenhai Li has posted comments on this change. Change subject: Applied the multiway fuzzyjoin based on the prefix-based join and the selectFuzzyJoin testCases. ......................................................................
Patch Set 39: (21 comments) All have been re-phased as required. In IsomophicUtils, I use that validate var to avoid the Q warnings. https://asterix-gerrit.ics.uci.edu/#/c/1076/39//COMMIT_MSG Commit Message: Line 9: - Enabled the fuzzyjoin rule. > Enabled -> Enable Done Line 10: - Introduced eight existing rules in FuzzyJoinRuleCollections after applied the fuzzyjoin rule. > Introduced -> Introduce Done Line 18: - hybrid multiway fuzzyjoin with the both forms of fuzzyjoins. > Could you tell me what "link-like", "star-like" and "hybrid"? We can discus Done Line 19: - Add a running Cases for select-fuzzyjoin. > What is select-fuzzyjoin? select over a existing prefix-based fuzzyjoin. Line 20: - Change the inverted-based fuzzyjoin onto prefix-based join due to the efficiency considerations. > inverted-based -> inverted-index-based Done https://asterix-gerrit.ics.uci.edu/#/c/1076/39/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java: Line 138: @Override public boolean rewritePost(Mutable<ILogicalOperator> opRef, IOptimizationContext context) > public boolean ... should start at new line. Done Line 141: // current operator is join > --> current operator should be a join. Done Line 147: // Find GET_ITEM function. > --> Find GET_ITEM function in the join condition. Done Line 149: Mutable<ILogicalExpression> expRef = joinOp.getCondition(); > expRef -> exprRef Done Line 190: > Are we always sure that we always see two variables, not some expressions? Till now, the composite conditions will alway place the fuzzy join condition in the first place after compiling (line 156). in this similarity join condition, we alway introduce single fields fuzzy join, which means 0: left fields' tokens, 1: right fields' tokens, 2: maybe a threshold has been introduced. Line 202: // leftInputPKs in currrentPKs extract all the PKs derived from the left branch in the newest fuzzyjoin. > Could you explain the meaning of "newest fuzzy join"? newest->current Line 212: IAType leftType = (IAType) context.getOutputTypeEnvironment(leftInputOp).getVarType(leftInputVar); > You calculate this again here. leftType was already calculated in regardAsP Has put it ahead of regardAsPrefixFuzzyJoin and changed the parameters of this function. Line 258: // 2. Otherwise, we can apply this rule to its branches to trigger a prefix-based fuzzyjoin. > For the above comments, I would like to suggest the following. Please check Done Line 259: private boolean regardAsPrefixFuzzyJoin(IOptimizationContext context, ILogicalOperator leftInputOp, > regardAsPrefixFuzzyJoin -> isPrefixFuzzyJoin might be better? Done Line 265: // If PKs derived from the both branches are SAME as a previous fuzzyjoin, we treat this ~= as a select. > we treat this ~= as a select ==> we treat this as a select over a fuzzy-joi Done Line 272: //Suppose we want to query on the same table on the different fields, i.e. A.a1 ~= B.b1 AND A.a2 ~= B.b2 > table -> dataset. Please apply this to the all places. Done Line 275: // Avoid the duplicated PK generation in findPrimaryKeysInSubplan, especially for multiway fuzzy join. > Avoid --> Avoids Done Line 278: // Fail if primary keys could not be inferred. > Fail --> Fails Done Line 285: // left-hand side and right-hand side of fuzzyjoin has the same type > has the same type --> should be the same type. Done Line 428: } catch (CompilationException e) { > CRITICAL SonarQube violation: Done Line 464: private Mutable<ILogicalExpression> getSimilarityExpression(Mutable<ILogicalExpression> expRef) { > expRef -> exprRef Done -- To view, visit https://asterix-gerrit.ics.uci.edu/1076 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8736f104905eeda763d39709e002c2b9629278cc Gerrit-PatchSet: 39 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Wenhai Li <[email protected]> Gerrit-Reviewer: Chen Li <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Taewoo Kim <[email protected]> Gerrit-Reviewer: Wenhai Li <[email protected]> Gerrit-HasComments: Yes
