[
https://issues.apache.org/jira/browse/JENA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346475#comment-17346475
]
Claus Stadler commented on JENA-2107:
-------------------------------------
For the Dataset-based implementation we could subclass the find methods of
DatasetGraphWrapper to keep track of the internal iterator sizes. After running
a query on such an dataset instance one could then check whether only a
specific number of tuples have been touched
Alternatively, one could track the arguments passed to find and check whether
those match an expected sequence (or set) of reference arguments - which would
be more traceable than mere counts.
Sketch:
{code:java}
class TrackingDatasetGraph extends DatasetGraphWrapper {
protected long numSeenTuples = 0;
protected Collection<?> seenArgs = new LinkedHashSet<>(); // or ArrayList
@Override
public Iterator<Quad> find(Node g, Node s, Node p, Node o) {
seenArgs.add(Arrays.asList(g, s, p, o));
try {
Iterator<Quad> it = getR().find()
List<Quad> materialized = Iter.toList(it);
numSeenTuples += materialized.size();
return materialized.iterator();
}
}
{code}
It's just somewhat cumbersome having to repeat the same pattern for
NodeTupleTable(Wrapper).
Having at least a single test case would already be beneficial for detecting
regressions in this regard while work on RDF star progresses.
> RDF Star performance issue with non-concrete node triples
> ---------------------------------------------------------
>
> Key: JENA-2107
> URL: https://issues.apache.org/jira/browse/JENA-2107
> Project: Apache Jena
> Issue Type: Improvement
> Components: ARQ
> Affects Versions: Jena 3.17.0, Jena 4.0.0
> Reporter: Lorenz Bühmann
> Priority: Critical
> Fix For: Jena 4.1.0
>
>
> the following graph pattern is not evaluated efficiently (results in
> full-scan per binding) because the second triple pattern doesn't take
> advantage of the bindings generated by evaluation of the first one:
> {code:java}
> ?s <p> ?o .
> << ?s <p> ?o >> <p2> ?v .
> {code}
> A possible fix would be to adapt the method {{rdfStarTripleSub()}} in class
>
> [SolverRX3.java|https://github.com/apache/jena/blob/2efff8a00b4ffa82751cf46c8a3fed84b6ff3090/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/solver/SolverRX3.java#L63-L71]
> by changing the beginning to
> {code:java}
> private static Iterator<Binding> rdfStarTripleSub(Binding input, Triple
> xPattern, ExecutionContext execCxt) {
> Triple tPattern = Substitute.substitute(xPattern, input);
> {code}
> We went from 75s for a very small dataset (50k triples) to near instant
> response times.
> If this fix is correct and doesn't break anything, it might be the same way
> to fix for its quads counterpart in {{SolverRX4}} class.
>
> Note, for tdbquery, this seems to be evaluated at a different place? At
> least, we couldn't find any performance improvement, it's still horribly slow.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)