[
https://issues.apache.org/jira/browse/JENA-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133975#comment-15133975
]
Benjamin Geer edited comment on JENA-1121 at 2/5/16 10:57 AM:
--------------------------------------------------------------
It's useful to know that {{COALESCE}} can block optimisations in Fuseki 2.3.1.
Unfortunately we use {{COALESCE}} a lot, and getting rid of it would require
rewriting a lot of application code.
Here's another query that doesn't use {{COALESCE}}, and in which the
optimisation of {{MINUS}} seems to make it run very fast on Fuseki 2.3.0 and
very slowly on Fuseki 2.3.1. It also illustrates the issue with the placement
of {{MINUS}}, which in this case seems counter-intuitive to me.
{noformat}
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix knora-base: <http://www.knora.org/ontology/knora-base#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?seqnum ?sourceObject ?firstprop ?preview ?internalFilename
?internalMimeType ?originalFilename ?dimX ?dimY ?qualityLevel
WHERE {
BIND(IRI("http://data.knora.org/c5058f3a") as ?resource)
MINUS {
?resource knora-base:isDeleted true .
}
?resource rdf:type ?resourceClass .
?resourceClass rdfs:subClassOf+ knora-base:Resource .
# Find something that relates to this resource via isPartOf.
?linkingProp rdfs:subPropertyOf+ knora-base:isPartOf .
?seqProp rdfs:subPropertyOf+ knora-base:seqnum .
?sourceObject ?linkingProp ?resource .
?sourceObject ?seqProp ?seqnumVal .
?seqnumVal knora-base:valueHasInteger ?seqnum .
?sourceObject rdfs:label ?firstprop .
OPTIONAL {
?fileValueProp rdfs:subPropertyOf* knora-base:hasFileValue .
?sourceObject ?fileValueProp ?preview .
MINUS {
?preview knora-base:isDeleted true .
}
?preview a knora-base:StillImageFileValue .
?preview knora-base:isPreview true .
?preview knora-base:internalMimeType ?internalMimeType ;
knora-base:originalFilename ?originalFilename ;
knora-base:internalFilename ?internalFilename ;
knora-base:dimX ?dimX ;
knora-base:dimY ?dimY ;
knora-base:qualityLevel ?qualityLevel .
}
MINUS {
?sourceObject knora-base:isDeleted true .
}
}
{noformat}
With Fuseki 2.3.0, this query runs in 250 ms. With Fuseki 2.3.1, it takes 38
seconds.
If I remove the {{MINUS}} that's inside the {{OPTIONAL}}, Fuseki 2.3.1 runs it
in 2.8 seconds.
If I then move the first MINUS in the query down one line (so it's just after
{{?resource rdf:type ?resourceClass .}}), Fuseki 2.3.1 runs it in 690 ms. This
is what seems counter-intuitive to me. Fuseki knows the IRI of {{?resource}}
from the {{BIND}} statement. I don't understand why checking its {{rdf:type}}
first makes the {{MINUS}} faster. In contrast, making the same change doesn't
seem to have any effect on the performance of Fuseki 2.3.0.
Note that since {{knora-base:isDeleted}} doesn't occur in the test data, none
of these {{MINUS}} statements should eliminate any results.
was (Author: benjamingeer):
It's useful to know that {{COALESCE}} can block optimisations in Fuseki 2.3.1.
Unfortunately we use {{COALESCE}} a lot, and getting rid of it would require
rewriting a lot of application code.
Here's another query that doesn't use {{COALESCE}}, and in which the
optimisation of {{MINUS}} seems to make it run very fast on Fuseki 2.3.0 and
very slowly on Fuseki 2.3.1. It also illustrates the issue with the placement
of {{MINUS}}, which in this case seems counter-intuitive to me.
{noformat}
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix knora-base: <http://www.knora.org/ontology/knora-base#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?seqnum ?sourceObject ?firstprop ?preview ?internalFilename
?internalMimeType ?originalFilename ?dimX ?dimY ?qualityLevel
WHERE {
BIND(IRI("http://data.knora.org/c5058f3a") as ?resource)
MINUS {
?resource knora-base:isDeleted true .
}
?resource rdf:type ?resourceClass .
?resourceClass rdfs:subClassOf+ knora-base:Resource .
# Find something that relates to this resource via isPartOf.
?linkingProp rdfs:subPropertyOf+ knora-base:isPartOf .
?seqProp rdfs:subPropertyOf+ knora-base:seqnum .
?sourceObject ?linkingProp ?resource .
?sourceObject ?seqProp ?seqnumVal .
?seqnumVal knora-base:valueHasInteger ?seqnum .
?sourceObject rdfs:label ?firstprop .
OPTIONAL {
?fileValueProp rdfs:subPropertyOf* knora-base:hasFileValue .
?sourceObject ?fileValueProp ?preview .
MINUS {
?preview knora-base:isDeleted true .
}
?preview a knora-base:StillImageFileValue .
?preview knora-base:isPreview true .
?preview knora-base:internalMimeType ?internalMimeType ;
knora-base:originalFilename ?originalFilename ;
knora-base:internalFilename ?internalFilename ;
knora-base:dimX ?dimX ;
knora-base:dimY ?dimY ;
knora-base:qualityLevel ?qualityLevel .
}
MINUS {
?sourceObject knora-base:isDeleted true .
}
}
{noformat}
With Fuseki 2.3.0, this query runs in 250 ms. With Fuseki 2.3.1, it takes 38
seconds.
If I remove the {{MINUS}} that's inside the {{OPTIONAL}}, Fuseki 2.3.1 runs it
in 2.8 seconds.
If I then move the first MINUS in the query down one line (so it's just after
{{?resource rdf:type ?resourceClass .}}), Fuseki 2.3.1 runs it in 690 ms. This
is what seems counter-intuitive to me. Fuseki knows the IRI of {{?resource}}
from the {{BIND}} statement. I don't understand why checking its {{rdf:type}}
first makes the {{MINUS}} faster. In contrast, making the same change doesn't
seem to have any effect on the performance of Fuseki 2.3.0.
> Performance regression in Jena 3.0.1 / Fuseki 2.3.1
> ---------------------------------------------------
>
> Key: JENA-1121
> URL: https://issues.apache.org/jira/browse/JENA-1121
> Project: Apache Jena
> Issue Type: Bug
> Components: Jena
> Affects Versions: Jena 3.0.1, Fuseki 2.3.1, Jena 3.1.0, Fuseki 2.4.0
> Environment: Mac OS X 10.10.5, iMac, 3.4 GHz Intel Core i7, 32 GB RAM
> Reporter: Benjamin Geer
> Priority: Critical
> Labels: performance
>
> We seem to have encountered a severe performance regression in Jena 3.0.1 /
> Fuseki 2.3.1 as compared with Jena 3.0.0 / Fuseki 2.3.0. A number of our
> queries are running between 2 and 20 times slower. Here's one small example
> with configuration for Fuseki. With Fuseki 2.3.0, the query below takes about
> 200 milliseconds. With Fuseki 2.3.1, it takes 9 seconds. I've also tried it
> with the latest Fuseki snapshot
> (apache-jena-fuseki-2.4.0-20160117.183513-33.zip), and got the same result as
> with the 2.3.1 release.
> Here's the test data and configuration:
> https://www.dropbox.com/s/b9aepexij5e7noj/jena-performance-test.zip?dl=0
> Here's the query:
> {noformat}
> prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> prefix knora-base: <http://www.knora.org/ontology/knora-base#>
> SELECT DISTINCT
> ?resourceIri
> ?resourceLabel
> (SAMPLE(?anyMatch) AS ?match)
> WHERE {
> BIND(STR("de") AS ?preferredLanguage)
> BIND(STR("en") AS ?fallbackLanguage)
> ?s <http://jena.apache.org/text#query> 'Zeitglöcklein' .
> MINUS {
> ?s knora-base:isDeleted true .
> }
> OPTIONAL {
> ?s a ?valueObjectType .
> ?valueObjectType rdfs:subClassOf+ knora-base:Value .
> ?resIri ?resourceProperty ?s .
> ?s knora-base:valueHasString ?literal .
> OPTIONAL {
> ?resourceProperty rdfs:label
> ?preferredLanguageResourcePropertyLabel .
> FILTER (LANG(?preferredLanguageResourcePropertyLabel) =
> ?preferredLanguage) .
> }
> OPTIONAL {
> ?resourceProperty rdfs:label
> ?fallbackLanguageResourcePropertyLabel .
> FILTER (LANG(?fallbackLanguageResourcePropertyLabel) =
> ?fallbackLanguage) .
> }
> OPTIONAL {
> ?resourceProperty rdfs:label ?anyLanguageResourcePropertyLabel .
> }
> BIND(COALESCE(str(?preferredLanguageResourcePropertyLabel),
> str(?fallbackLanguageResourcePropertyLabel),
> str(?anyLanguageResourcePropertyLabel)) AS ?propertyLabel)
> BIND(CONCAT(STR(?valueObjectType), "|", STR(?propertyLabel), "|",
> STR(?literal)) AS ?anyMatch)
> MINUS {
> ?resIri knora-base:isDeleted true .
> }
> }
> BIND(COALESCE(?resIri, ?s) AS ?resourceIri)
> ?resourceIri a ?resourceClass .
> ?resourceClass rdfs:subClassOf+ knora-base:Resource .
> ?resourceIri rdfs:label ?resourceLabel .
> }
> GROUP BY
> ?resourceIri
> ?resourceLabel
> ORDER BY ?resourceIri
> {noformat}
> Best regards,
> Benjamin Geer
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)