[
https://issues.apache.org/jira/browse/JENA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457267#comment-16457267
]
ASF GitHub Bot commented on JENA-1534:
--------------------------------------
Github user kinow commented on the issue:
https://github.com/apache/jena/pull/409
Learning a bit more of Jena internals (weather is really bad here in the
antipodes :-) ). So picked this PR to review, so that I could learn more about
Jena internals.
Had a look at the previous tickets, and from JENA-1534, looks like the
query was not considering a variable for a join, I think.
So created an empty persistent dataset in TDB, loaded the `books.ttl` from
Jena code, and the `yso.ttl` from SKOSMOS in the yso graph (suspected it would
have some blank nodes, etc).
Tried the query
```sql
SELECT * WHERE { ?s ?p ?V0 GRAPH ?g { ?sx ?p ?ox FILTER EXISTS { _:b0 ?p
?V0 } } }
```
And it returned few results, quite quickly. So went and tried `tdbquery`
from TDB (not TDB2). With following command line (actually in Eclipse, but same
as):
```shell
tdbquery --explain
--loc=/home/kinow/Development/java/jena/jena/jena-fuseki2/jena-fuseki-core/run/databases/p1/
"SELECT * WHERE { ?s ?p ?V0 GRAPH ?g { ?sx ?p ?ox FILTER EXISTS { _:b0 ?p ?V0
} } }"
```
Adding also `-Dlog4j.configuration=file:///tmp/log4j.properties
-Dlog4j.debug=true` to see the explain output.
With the code from master, it took <2 seconds to execute the query, and
produced the following algebra:
```sql
12:36:03 INFO exec :: QUERY
SELECT *
WHERE
{ ?s ?p ?V0
GRAPH ?g
{ ?sx ?p ?ox
FILTER EXISTS { _:b0 ?p ?V0 }
}
}
12:36:03 INFO exec :: ALGEBRA
(sequence
(quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?p ?V0))
(filter (exists
(quadpattern (quad ?g ??0 ?p ?V0)))
(quadpattern (quad ?g ?sx ?p ?ox))))
```
So that was a sequence (related to `OpSequence` in Jena, which I'm using to
search in Eclipse for occurrences to see how it's used). Checked out the branch.
```shell
$ git fetch --all
$ git fetch github refs/pull/409/head:pr-409
$ git checkout pr-409
$ mvn clean install -Pdev -DskipTests
```
Did the `tdbquery` command again, now the algebra became:
```sql
12:41:48 INFO exec :: QUERY
SELECT *
WHERE
{ ?s ?p ?V0
GRAPH ?g
{ ?sx ?p ?ox
FILTER EXISTS { _:b0 ?p ?V0 }
}
}
12:41:48 INFO exec :: ALGEBRA
(join
(quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?p ?V0))
(filter (exists
(quadpattern (quad ?g ??0 ?p ?V0)))
(quadpattern (quad ?g ?sx ?p ?ox))))
12:41:48 INFO exec :: TDB
(join
(quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?p ?V0))
(filter (exists
(quadpattern (quad ?g ??0 ?p ?V0)))
(quadpattern (quad ?g ?sx ?p ?ox))))
```
A join ! If I understood the tickets, that's the exactly intended
behaviour, as before the variable in the exist was not being taken into
consideration to produce a `JOIN` (`OpJoin`). The query also took much longer,
>10 seconds (I guess SKOSMOS' YSO vocab has something like ~220K triples? The
Harry Potter books dataset should have like 5? anywho).
So +1 ! LGTM :tada:
Will probably spend some time reading more of the code base to see if I can
learn a bit more. And found two posts
([1](https://gregheartsfield.com/2012/08/26/jena-arq-query-performance.html)
[2](https://www.slideshare.net/olafhartig/the-semantics-of-sparql) with some
interesting content. In case you have any other pointers to learn more about
it, or if I said anything silly, feel free to correct/share, please :-)
*ps: while reading the javadocs of the `Op*` classes, noticed some typos in
OpSequence. Should I open a pull request for that, or just commit to master?*
```java
-/** A "sequence" is a join-like operation where it is know that the
- * the output of one step can be fed into the input of the next
+/** A "sequence" is a join-like operation where it is known that
+ * the output of one step can be fed into the input of the next
* (that is, no scoping issues arise). */
public class OpSequence extends OpN
```
> Variables in EXISTS must be considered for the join strategy
> ------------------------------------------------------------
>
> Key: JENA-1534
> URL: https://issues.apache.org/jira/browse/JENA-1534
> Project: Apache Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: Jena 3.7.0
> Reporter: Andy Seaborne
> Assignee: Andy Seaborne
> Priority: Major
> Fix For: Jena 3.8.0
>
>
> This query has a join between the GRAPH and the pattern before it.
> {noformat}
> SELECT *
> WHERE
> { ?s ?p ?V0
> GRAPH ?g
> { ?sx ?p ?ox
> FILTER EXISTS { _:b0 ?p ?V0 }
> }
> }
> {noformat}
> The fact {{?V0}} occurs in the LHS of the join in {{?s ?p ?V0}} and in the
> FILTER but not in the rest of the RHS means the "sequence" transform can not
> be used.
> Contrast to:
> {noformat}
> SELECT *
> WHERE
> { ?s ?p ?V1
> GRAPH ?g
> { ?sx ?p ?ox
> FILTER EXISTS { _:b0 ?p ?V2 }
> }
> }
> {noformat}
> Now {{?V2}} is only in the FILTER so it is safe to transform the join.
> Note that {{?p}} appears in LHS so making it defined in the EXISTS and the
> sequence transfrom is possible.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)