Wenhai Li has posted comments on this change. Change subject: Applied the multiway fuzzyjoin based on the prefix-based join and the selectFuzzyJoin testCases. ......................................................................
Patch Set 38: (18 comments) Since the newly introduced rules inside FuzzyJoinRuleCollection is triggered before its following ABC rule collections (remember ABC F ABC?). You know, we configured CondPushDownAndJoinInferenceRuleCollection as noDfs to avoid the dead loop of the iterative RemoveUnusedVars/AggsRule after PushSelectIntoJoinRule and its related rules. I guess the variable has been substituted by FuzzyCollections before the second round of ABC, which results in this inconsistencies. But after we check that difference, we find they are isomorphic, i.e., the only difference aims at the variable number $101 -> $102, does it matter? https://asterix-gerrit.ics.uci.edu/#/c/1076/38/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/compiler/provider/DefaultRuleSetFactory.java File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/compiler/provider/DefaultRuleSetFactory.java: Line 54: defaultLogicalRewrites > This line should not be changed. Let's keep it as it is. Refer to below, it was wrong. Most importantly, I cannot turn off the automatic grammatical checking of IntelliJ. https://asterix-gerrit.ics.uci.edu/#/c/1076/38/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java: Line 421: throw new AlgebricksException(e); > I think this should be a CompilationException and it needs to have an error Done Line 428: } catch (AsterixException e) { > We will not use AsterixException. So, after this patch set: https://asterix Done Line 429: throw new AlgebricksException(e); > I think this should be a CompilationException and it needs to have an error Done https://asterix-gerrit.ics.uci.edu/#/c/1076/38/asterixdb/asterix-app/src/test/resources/optimizerts/queries/fj-dblp-csx-hybrid.aql File asterixdb/asterix-app/src/test/resources/optimizerts/queries/fj-dblp-csx-hybrid.aql: Line 20: * Description : Multiple fuzzy join on three tables, with a star join condition. > tables -> datasets Done https://asterix-gerrit.ics.uci.edu/#/c/1076/38/asterixdb/asterix-app/src/test/resources/optimizerts/queries/fj-dblp-csx-selflink.aql File asterixdb/asterix-app/src/test/resources/optimizerts/queries/fj-dblp-csx-selflink.aql: Line 21: * The base table DBLP will fuzzy join with CSX and propagate the results > table -> dataset Done https://asterix-gerrit.ics.uci.edu/#/c/1076/38/asterixdb/asterix-app/src/test/resources/optimizerts/queries/fj-dblp-csx-simple.aql File asterixdb/asterix-app/src/test/resources/optimizerts/queries/fj-dblp-csx-simple.aql: Line 19: drop dataverse fj-dblp-csx if exists; > Please put a comment here - what this test does. Done Line 26: id: int32, > Remove this white space. Done Line 34: id: int32, > Remove this white space. Done https://asterix-gerrit.ics.uci.edu/#/c/1076/38/asterixdb/asterix-app/src/test/resources/optimizerts/queries/fj-dblp-csx-star.aql File asterixdb/asterix-app/src/test/resources/optimizerts/queries/fj-dblp-csx-star.aql: Line 21: * The CSX and DBLP tables are used twice and will be propagated onto > table -> dataset Done https://asterix-gerrit.ics.uci.edu/#/c/1076/38/asterixdb/asterix-app/src/test/resources/runtimets/queries/fuzzyjoin/dblp-csx-4.1.1/word-jaccard.1.ddl.aql File asterixdb/asterix-app/src/test/resources/runtimets/queries/fuzzyjoin/dblp-csx-4.1.1/word-jaccard.1.ddl.aql: Line 21: * We expect the join to be transformed into an prefix-based fuzzy join following with a < select. > an -> a "an 'less than' select" ? https://asterix-gerrit.ics.uci.edu/#/c/1076/38/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/logical/visitors/IsomorphismUtilities.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/logical/visitors/IsomorphismUtilities.java: Line 81: // Iteratively get all the produced variables from the downstream operators of root. > This returns an operator that produced the given PK variable. Right? Please Done Line 82: private static ILogicalOperator extractPKProduction(ILogicalOperator root, LogicalVariable pk) > extractPKProduction -> getOpThatProducePK, Done Line 92: validate = true; > Rather than using validate variable, you can just break here. Please check. Done Line 96: validate = true; > Rather than using validate variable, you can just break here. Please check. Done Line 107: public static void mergeHomogeneousPK(ILogicalOperator op, List<LogicalVariable> pks) throws AlgebricksException { > pks -> PKvars, Done Line 112: throw new AlgebricksException("Illegal variable production."); > Please follow the new exception handling proposal. How about the AlgebricksException standard? I noticed almost all the AlgebricksExceptions use this form, any more suggestion? Line 116: Map<LogicalVariable, LogicalVariable> variableMapping = new HashMap<>(); > What's key and value? Can we put a comment? Done -- To view, visit https://asterix-gerrit.ics.uci.edu/1076 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8736f104905eeda763d39709e002c2b9629278cc Gerrit-PatchSet: 38 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Wenhai Li <[email protected]> Gerrit-Reviewer: Chen Li <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Taewoo Kim <[email protected]> Gerrit-Reviewer: Wenhai Li <[email protected]> Gerrit-HasComments: Yes
