Github user kinow commented on the issue:
https://github.com/apache/jena/pull/409
Learning a bit more of Jena internals (weather is really bad here in the
antipodes :-) ). So picked this PR to review, so that I could learn more about
Jena internals.
Had a look at the previous tickets, and from JENA-1534, looks like the
query was not considering a variable for a join, I think.
So created an empty persistent dataset in TDB, loaded the `books.ttl` from
Jena code, and the `yso.ttl` from SKOSMOS in the yso graph (suspected it would
have some blank nodes, etc).
Tried the query
```sql
SELECT * WHERE { ?s ?p ?V0 GRAPH ?g { ?sx ?p ?ox FILTER EXISTS { _:b0 ?p
?V0 } } }
```
And it returned few results, quite quickly. So went and tried `tdbquery`
from TDB (not TDB2). With following command line (actually in Eclipse, but same
as):
```shell
tdbquery --explain
--loc=/home/kinow/Development/java/jena/jena/jena-fuseki2/jena-fuseki-core/run/databases/p1/
"SELECT * WHERE { ?s ?p ?V0 GRAPH ?g { ?sx ?p ?ox FILTER EXISTS { _:b0 ?p ?V0
} } }"
```
Adding also `-Dlog4j.configuration=file:///tmp/log4j.properties
-Dlog4j.debug=true` to see the explain output.
With the code from master, it took <2 seconds to execute the query, and
produced the following algebra:
```sql
12:36:03 INFO exec :: QUERY
SELECT *
WHERE
{ ?s ?p ?V0
GRAPH ?g
{ ?sx ?p ?ox
FILTER EXISTS { _:b0 ?p ?V0 }
}
}
12:36:03 INFO exec :: ALGEBRA
(sequence
(quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?p ?V0))
(filter (exists
(quadpattern (quad ?g ??0 ?p ?V0)))
(quadpattern (quad ?g ?sx ?p ?ox))))
```
So that was a sequence (related to `OpSequence` in Jena, which I'm using to
search in Eclipse for occurrences to see how it's used). Checked out the branch.
```shell
$ git fetch --all
$ git fetch github refs/pull/409/head:pr-409
$ git checkout pr-409
$ mvn clean install -Pdev -DskipTests
```
Did the `tdbquery` command again, now the algebra became:
```sql
12:41:48 INFO exec :: QUERY
SELECT *
WHERE
{ ?s ?p ?V0
GRAPH ?g
{ ?sx ?p ?ox
FILTER EXISTS { _:b0 ?p ?V0 }
}
}
12:41:48 INFO exec :: ALGEBRA
(join
(quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?p ?V0))
(filter (exists
(quadpattern (quad ?g ??0 ?p ?V0)))
(quadpattern (quad ?g ?sx ?p ?ox))))
12:41:48 INFO exec :: TDB
(join
(quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?p ?V0))
(filter (exists
(quadpattern (quad ?g ??0 ?p ?V0)))
(quadpattern (quad ?g ?sx ?p ?ox))))
```
A join ! If I understood the tickets, that's the exactly intended
behaviour, as before the variable in the exist was not being taken into
consideration to produce a `JOIN` (`OpJoin`). The query also took much longer,
>10 seconds (I guess SKOSMOS' YSO vocab has something like ~220K triples? The
Harry Potter books dataset should have like 5? anywho).
So +1 ! LGTM :tada:
Will probably spend some time reading more of the code base to see if I can
learn a bit more. And found two posts
([1](https://gregheartsfield.com/2012/08/26/jena-arq-query-performance.html)
[2](https://www.slideshare.net/olafhartig/the-semantics-of-sparql) with some
interesting content. In case you have any other pointers to learn more about
it, or if I said anything silly, feel free to correct/share, please :-)
*ps: while reading the javadocs of the `Op*` classes, noticed some typos in
OpSequence. Should I open a pull request for that, or just commit to master?*
```java
-/** A "sequence" is a join-like operation where it is know that the
- * the output of one step can be fed into the input of the next
+/** A "sequence" is a join-like operation where it is known that
+ * the output of one step can be fed into the input of the next
* (that is, no scoping issues arise). */
public class OpSequence extends OpN
```
---