[GitHub] incubator-rya pull request #255: RYA-417 Forward-chaining batch rules engine

jessehatfield Tue, 09 Jan 2018 15:31:23 -0800

Github user jessehatfield commented on a diff in the pull request:

    https://github.com/apache/incubator-rya/pull/255#discussion_r160554313
  
    --- Diff: 
dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/aggregation/SparqlToPipelineTransformVisitor.java
 ---
    @@ -0,0 +1,196 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.rya.mongodb.aggregation;
    +
    +import java.util.Arrays;
    +
    +import org.apache.rya.mongodb.StatefulMongoDBRdfConfiguration;
    +import org.bson.Document;
    +import org.openrdf.query.algebra.Distinct;
    +import org.openrdf.query.algebra.Extension;
    +import org.openrdf.query.algebra.Filter;
    +import org.openrdf.query.algebra.Join;
    +import org.openrdf.query.algebra.MultiProjection;
    +import org.openrdf.query.algebra.Projection;
    +import org.openrdf.query.algebra.Reduced;
    +import org.openrdf.query.algebra.StatementPattern;
    +import org.openrdf.query.algebra.helpers.QueryModelVisitorBase;
    +
    +import com.google.common.base.Preconditions;
    +import com.mongodb.MongoClient;
    +import com.mongodb.client.MongoCollection;
    +import com.mongodb.client.MongoDatabase;
    +
    +/**
    + * Visitor that transforms a SPARQL query tree by replacing as much of the 
tree
    + * as possible with one or more {@code AggregationPipelineQueryNode}s.
    + * <p>
    + * Each {@link AggregationPipelineQueryNode} contains a MongoDB aggregation
    + * pipeline which is equivalent to the replaced portion of the original 
query.
    + * Evaluating this node executes the pipeline and converts the results into
    + * query solutions. If only part of the query was transformed, the 
remaining
    + * query logic (higher up in the query tree) can be applied to those
    + * intermediate solutions as normal.
    + * <p>
    + * In general, processes the tree in bottom-up order: A leaf node
    + * ({@link StatementPattern}) is replaced with a pipeline that matches the
    + * corresponding statements. Then, if the parent node's semantics are 
supported
    + * by the visitor, stages are appended to the pipeline and the subtree at 
the
    + * parent node is replaced with the extended pipeline. This continues up 
the
    + * tree until reaching a node that cannot be transformed, in which case 
that
    + * node's child is now a single {@code AggregationPipelineQueryNode} (a 
leaf
    + * node) instead of the previous subtree, or until the entire tree has been
    + * subsumed into a single pipeline node.
    + * <p>
    + * Nodes which are transformed into pipeline stages:
    + * <p><ul>
    + * <li>A {@code StatementPattern} node forms the beginning of each 
pipeline.
    + * <li>Single-argument operations {@link Projection}, {@link 
MultiProjection},
    + * {@link Extension}, {@link Distinct}, and {@link Reduced} will be 
transformed
    + * into pipeline stages whenever the child {@link TupleExpr} represents a
    + * pipeline.
    + * <li>A {@link Filter} operation will be appended to the pipeline when its
    + * child {@code TupleExpr} represents a pipeline and the filter condition 
is a
    + * type of {@link ValueExpr} understood by {@code 
AggregationPipelineQueryNode}.
    + * <li>A {@link Join} operation will be appended to the pipeline when one 
child
    + * is a {@code StatementPattern} and the other is an
    + * {@code AggregationPipelineQueryNode}.
    + * </ul>
    + */
    +public class SparqlToPipelineTransformVisitor extends 
QueryModelVisitorBase<Exception> {
    --- End diff --
    
    It doesn't strictly have to be executed first, though it may be better. For 
example, if the tree is `Join(<somethingComplex>, 
AggregationPipelineQueryNode))`, then the join iterator will get an iterator of 
results from the complex thing on the left, then for each result, execute the 
pipeline -- likely not optimal.
    
    The only logic to try to group pipeline-amenable nodes/subqueries together 
is in this visitor; there's no restructuring done at a higher level to make it 
work better (except to the extent that ordinary query planning steps may happen 
to help). For example, this visitor can turn `Join(Join(Join(SP1, SP2), SP3), 
SP4)` into a single `AggregationPipelineQueryNode`, but it can only turn 
`Join(Join(SP1, SP2), Join(SP3, SP4))` into 
`Join(AggregationPipelineQueryNode1, AggregationPipelineQueryNode2)`. Ideally 
the query would take the former form, and in this case I believe it typically 
does. But there's room for development and testing in terms of what query forms 
this optimization is good for, and how it should interact with the rest of 
query planning. Uncertainty here is the main reason I left this optimization 
disabled by default.

---

[GitHub] incubator-rya pull request #255: RYA-417 Forward-chaining batch rules engine

Reply via email to