Yingyi Bu has posted comments on this change. Change subject: Equivalence induced onetoone operatiions for insert/delete/upsert/query exchange optimization for lookup and delete when the operation is merely directed by the primary key. ......................................................................
Patch Set 10: (26 comments) Here are some high-level comments: 1. We need to revisit what needs to be added into the partitioning property to indicate where the operator actually needs to be run. I'm not sure the abstraction of "isSingle" is enough. I inlined two motivating queries. 2. The caller of an IMetadataProvider probably shouldn't be a aware of a node id, but a hash value, or more general, the INodeDomain information. I think this also needs a discussion. 3. The code style seems not follow the recommended one: http://asterixdb.incubator.apache.org/dev-setup.html Other detailed comments are inlined. https://asterix-gerrit.ics.uci.edu/#/c/427/10/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/metadata/IMetadataProvider.java File algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/metadata/IMetadataProvider.java: Line 71: JobGenContext context, JobSpecification jobSpec, boolean bulkload, int nodeId) throws AlgebricksException; int nodeId->Integer hashValue I guess the caller shouldn't be aware of nodeId, but only hashValue. if the Integer is null, that means the reader needs to run on all storage nodes. Line 75: LogicalVariable payLoadVar, List<LogicalVariable> additionalNonKeyFields, RecordDescriptor recordDesc, int nodeId->Integer hashValue I guess the caller shouldn't be aware of nodeId, but only hashValue. if the Integer is null, that means the reader needs to run on all storage nodes. Line 115: JobGenContext context, JobSpecification spec, boolean bulkload, int nodeId) throws AlgebricksException; int nodeId->Integer hashValue I guess the caller shouldn't be aware of nodeId, but only hashValue. if the Integer is null, that means the reader needs to run on all storage nodes. Line 154: JobGenContext context, JobSpecification spec, int nodeId) throws AlgebricksException; int nodeId->Integer hashValue I guess the caller shouldn't be aware of nodeId, but only hashValue. if the Integer is null, that means the reader needs to run on all storage nodes. Line 199: RecordDescriptor recordDesc, JobGenContext context, JobSpecification jobSpec, int nodeId) int nodeId->Integer hashValue I guess the caller shouldn't be aware of nodeId, but only hashValue. if the Integer is null, that means the reader needs to run on all storage nodes. Line 207: RecordDescriptor inputDesc, JobGenContext context, JobSpecification spec, int nodeId) int nodeId->Integer hashValue I guess the caller shouldn't be aware of nodeId, but only hashValue. if the Integer is null, that means the reader needs to run on all storage nodes. https://asterix-gerrit.ics.uci.edu/#/c/427/10/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/logical/visitors/FDsAndEquivClassesVisitor.java File algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/logical/visitors/FDsAndEquivClassesVisitor.java: Line 316: + GroupByOperator.veListToString(gByList) + " to " + GroupByOperator.veListToString(newGbyList) It seems your code style is not right. https://asterix-gerrit.ics.uci.edu/#/c/427/10/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/logical/visitors/ProducedVariableExpressionVisitor.java File algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/logical/visitors/ProducedVariableExpressionVisitor.java: Line 66: * @author whli add some comments to describe what this class is used for. Line 72: public ProducedVariableExpressionVisitor(Map<List<LogicalVariable>, List<ILogicalExpression>> varToExpr) { Why not map a variable to an expression? Line 78: return null; aggregate produces variables. Line 83: return null; running aggregate produces variables. Line 93: return null; group by produces variables. Line 129: varToExpr.put(vars, exprs); why not map each individual variable to its definition expression? Line 169: public Void visitSubplanOperator(SubplanOperator op, Void arg) throws AlgebricksException { subplan produces variables. Line 180: // TODO Auto-generated method stub union produces variables. Line 186: // TODO Auto-generated method stub unnest produces variables. Line 204: } data scan produces variables. Line 234: List<LogicalVariable> vars = new ArrayList<LogicalVariable>(); insertdelete doesn't produces any variables. Here the code maps used variables to null? Line 256: // TODO Auto-generated method stub intersect produces variables. Line 262: // TODO Auto-generated method stub outer unnest produces variables. https://asterix-gerrit.ics.uci.edu/#/c/427/10/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/BulkloadPOperator.java File algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/BulkloadPOperator.java: Line 106: inputDesc, context, spec, true, -1); pass in null instead of -1. https://asterix-gerrit.ics.uci.edu/#/c/427/10/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/IndexBulkloadPOperator.java File algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/IndexBulkloadPOperator.java: Line 131: additionalFilteringKeys, filterExpr, inputDesc, context, spec, true, -1); -1 -> null https://asterix-gerrit.ics.uci.edu/#/c/427/10/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/InsertDeleteUpsertPOperator.java File algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/InsertDeleteUpsertPOperator.java: Line 105: ((UnorderedPartitionedProperty) r.getPartitioningProperty()).setSingle(this.isSinglePartition()); There shouldn't be a special handling of isSinglePart. https://asterix-gerrit.ics.uci.edu/#/c/427/10/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/properties/IPartitioningProperty.java File algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/properties/IPartitioningProperty.java: Line 94: public void setSingle(boolean isSingle); This looks to be an immutable field, thus it's better to remove the setter and enforce that in the constructor. Line 96: public boolean isSingle(); I'm not sure whether isSingle() is enough to capture the partitioning property. You also need to know which partition(s) are involved. for example, consider the following query: create dataset foo(id int32, value string); create dataset bar(id int32, value string); insert into dataset foo( for $d in dataset bar where $d.key = 1 return { "key": $d.key + 3, "value": $d.value } ); The primary key search for bar goes to one node and the insert to foo goes to one node. Both datasets are hash partitioned in the same way. But the two nodes could be different nodes. I'm thinking where we can extend DefaultNodeGroupDomain with a boolean vector to indicate where the operator instances need to be run. Also consider the following query (more general cases): for $d in dataset bar where $d.key = 1 OR $d.key=2 return $d It would be nice that the query can only go to two nodes (or one node), depending on the hash function. That also seems to require an more general abstraction than isSingle. https://asterix-gerrit.ics.uci.edu/#/c/427/10/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules/EnforceStructuralPropertiesRule.java File algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules/EnforceStructuralPropertiesRule.java: Line 97: it seems the code style is not right. -- To view, visit https://asterix-gerrit.ics.uci.edu/427 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic836210e57b87128120e1f8dabbfed062d09f5e4 Gerrit-PatchSet: 10 Gerrit-Project: hyracks Gerrit-Branch: master Gerrit-Owner: Wenhai Li <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Steven Jacobs <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: Wenhai Li <[email protected]> Gerrit-Reviewer: Yingyi Bu <[email protected]> Gerrit-HasComments: Yes
