[jira] [Commented] (JENA-2107) RDF Star performance issue with non-concrete node triples

Jira Tue, 18 May 2021 03:30:04 -0700


    [ 
https://issues.apache.org/jira/browse/JENA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346772#comment-17346772
 ]


Lorenz Bühmann commented on JENA-2107:
--------------------------------------

Some numbers:

#Triples = {{50,367}}

the shape of the data is:
 * 100 nodes with a directed connection to each other, i.e. 9900 triples of 
<v1,e,v2>
 * for each connection we have 4 triples making statements about the connection
 * plus some other data about the nodes themselves

a simplified query on the data executed is
{code:sql}
SELECT  (count(*) as ?cnt) {
?src <p> ?target .
 
 <<?src <p> ?target>> <p1> ?val1 ;
                      <p2> ?val2 
}{code}
h4. Runtimes:
{code:java}
sparql --time --repeat 2,5 --data ... --query ...{code}
h5. Jena 4.0.0

Time: 81.749 sec
Total time: 403.872 sec for repeat count of 5 : average: 80.774
h5. Jena 4.1.0-SNAPSHOT with fix

Time: 0.039 sec
Total time: 0.422 sec for repeat count of 5 : average: 0.084

> RDF Star performance issue with non-concrete node triples
> ---------------------------------------------------------
>
>                 Key: JENA-2107
>                 URL: https://issues.apache.org/jira/browse/JENA-2107
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>    Affects Versions: Jena 3.17.0, Jena 4.0.0
>            Reporter: Lorenz Bühmann
>            Priority: Critical
>             Fix For: Jena 4.1.0
>
>
> the following graph pattern is not evaluated efficiently (results in 
> full-scan per binding) because the second triple pattern doesn't take 
> advantage of the bindings generated by evaluation of the first one:
> {code:java}
> ?s <p> ?o .  
> << ?s <p> ?o >> <p2> ?v .
> {code}
> A possible fix would be to adapt the method {{rdfStarTripleSub()}} in class
>  
> [SolverRX3.java|https://github.com/apache/jena/blob/2efff8a00b4ffa82751cf46c8a3fed84b6ff3090/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/solver/SolverRX3.java#L63-L71]
>  by changing the beginning to
> {code:java}
> private static Iterator<Binding> rdfStarTripleSub(Binding input, Triple 
> xPattern, ExecutionContext execCxt) {
>         Triple tPattern = Substitute.substitute(xPattern, input);
> {code}
> We went from 75s for a very small dataset (50k triples) to near instant 
> response times.
> If this fix is correct and doesn't break anything, it might be the same way 
> to fix for its quads counterpart in {{SolverRX4}} class.
>  
> Note, for tdbquery, this seems to be evaluated at a different place? At 
> least, we couldn't find any performance improvement, it's still horribly slow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (JENA-2107) RDF Star performance issue with non-concrete node triples

Reply via email to