[
https://issues.apache.org/jira/browse/JENA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012684#comment-17012684
]
Andy Seaborne commented on JENA-1813:
-------------------------------------
Hi [~ssmith],
Thank you for the detailed report and a test case - makes it much easier to
investigate.
It's the "AS" in the SELECT that is the cause.
A simpler example:
{noformat}
PREFIX : <http://example/>
SELECT * {
:s :p ?A
{ GRAPH :g { BIND(1 AS ?A) } }
}
{noformat}
I think the root cause is that the variable classifier is not considering
assigned variables correctly. It tracks variables set by a pattern the same as
those set by AS. But while pattern variables can be replaced by a value, AS
variables can not. In fact, tracking the assignment separately may improve the
optimizer.
That change may take some time and, depending on complexity may not something
to do this close to the 3.14.0 release so I'll also investigate your patch.
> Join optimization transform results in incorrect query results
> --------------------------------------------------------------
>
> Key: JENA-1813
> URL: https://issues.apache.org/jira/browse/JENA-1813
> Project: Apache Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: Jena 3.13.1
> Reporter: Shawn Smith
> Assignee: Andy Seaborne
> Priority: Major
>
> I think I've found a query where TransformJoinStrategy incorrectly decides
> that a query is linear such that a "join" operation can be replaced by a
> "sequence" operation. As a result, the query returns incorrect results.
> Disabling optimizations with "qe.getContext().set(ARQ.optimization, false)"
> fixes the issue.
> Here's the query:
> {noformat}
> PREFIX : <http://example.com/>
> SELECT ?a
> WHERE {
> GRAPH :graph { :s :p ?a }
> GRAPH :graph {
> SELECT (?b AS ?a)
> WHERE { :t :q ?b }
> GROUP BY ?b
> }
> }
> {noformat}
> Here's the data to test it with (two quads, as Trig):
> {noformat}
> @prefix : <http://example.com/> .
> :graph {
> :s :p "a" .
> :t :q "b" .
> }
> {noformat}
> I expected the query to return zero results because the two GRAPH clauses
> can't find compatible bindings for ?a. But, in practice, Jena returns ?a="a"
> and logs a warning:
> {noformat}
> [main] WARN BindingUtils - merge: Mismatch : "a" != "b"{noformat}
> Note the warning is actually coming from QueryIterProjectMerge.java, not
> BindingUtils.java. With more complicated queries and datasets, this issue
> can result in thousands or millions of logged warnings.
> The query plan before optimization looks like this:
> {noformat}
> (project (?a)
> (join
> (graph <http://example.com/graph>
> (bgp (triple <http://example.com/s> <http://example.com/p> ?a)))
> (graph <http://example.com/graph>
> (project (?a)
> (extend ((?a ?b))
> (group (?b)
> (bgp (triple <http://example.com/t> <http://example.com/q>
> ?b))))))))
> {noformat}
> Optimization replaces "join" with "sequence" which fails to detect conflicts
> on ?a:
> {noformat}
> (project (?a)
> (sequence
> (graph <http://example.com/graph>
> (bgp (triple <http://example.com/s> <http://example.com/p> ?a)))
> (graph <http://example.com/graph>
> (project (?a)
> (extend ((?a ?/b))
> (group (?/b)
> (bgp (triple <http://example.com/t> <http://example.com/q>
> ?/b))))))))
> {noformat}
> For convenience, here's Java code that reproduces the bug:
> {noformat}
> import org.apache.jena.query.ARQ;
> import org.apache.jena.query.Dataset;
> import org.apache.jena.query.DatasetFactory;
> import org.apache.jena.query.QueryExecution;
> import org.apache.jena.query.QueryExecutionFactory;
> import org.apache.jena.query.ResultSet;
> import org.apache.jena.riot.Lang;
> import org.apache.jena.riot.RDFParser;
> import org.junit.Test;
> public class QueryTest {
> @Test
> public void testGraphQuery() {
> String query = "" +
> "PREFIX : <http://example.com/>\n" +
> "SELECT ?a\n" +
> "WHERE {\n" +
> " GRAPH :graph { :s :p ?a }\n" +
> " GRAPH :graph {\n" +
> " SELECT (?b AS ?a)\n" +
> " WHERE { :t :q ?b }\n" +
> " GROUP BY ?b\n" +
> " }\n" +
> "}\n";
> String data = "" +
> "@prefix : <http://example.com/> .\n" +
> ":graph {\n" +
> " :s :p \"a\" .\n" +
> " :t :q \"b\" .\n" +
> "}\n";
> Dataset ds = DatasetFactory.create();
> RDFParser.fromString(data).lang(Lang.TRIG).parse(ds);
> try (QueryExecution qe = QueryExecutionFactory.create(query, ds)) {
> qe.getContext().set(ARQ.optimization, true); // flipping this to
> false fixes the test
> ResultSet rs = qe.execSelect();
> if (rs.hasNext()) {
> System.out.println(rs.nextBinding());
> throw new AssertionError("Result set should be empty");
> }
> }
> }
> }
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)