Rupert Westenthaler created MARMOTTA-603:
--------------------------------------------
Summary: SPARQL OPTIONAL issues
Key: MARMOTTA-603
URL: https://issues.apache.org/jira/browse/MARMOTTA-603
Project: Marmotta
Issue Type: Bug
Components: KiWi Triple Store
Affects Versions: 3.3.0
Reporter: Rupert Westenthaler
Priority: Critical
The SPARQL implemenation of the KiWi triple store seams to have issues with the
evaluation of OPTIONAL segments of SPARQL queries. In the following test data
and test queries are provided.
h2. Data
{code}
<urn:test.org:place.1> rdf:type schema:Palce ;
schema:geo <urn:test.org:geo.1> ;
schema:name "Place 1" .
<urn:test.org:geo.1> rdf:type schema:GeoCoordinates ;
schema:latitude "16"^^xsd:double ;
schema:longitude "17"^^xsd:double ;
schema:elevation "123"^^xsd:int .
<urn:test.org:place.2> rdf:type schema:Palce ;
schema:geo <urn:test.org:geo.2> ;
schema:name "Place 2" .
<urn:test.org:geo.2> rdf:type schema:GeoCoordinates ;
schema:latitude "15"^^xsd:double ;
schema:longitude "16"^^xsd:double ;
schema:elevation "99"^^xsd:int .
<urn:test.org:place.3> rdf:type schema:Palce ;
schema:geo <urn:test.org:geo.3> ;
schema:name "Place 3" .
<urn:test.org:geo.3> rdf:type schema:GeoCoordinates ;
schema:latitude "15"^^xsd:double ;
schema:longitude "17"^^xsd:double .
<urn:test.org:place.4> rdf:type schema:Palce ;
schema:geo <urn:test.org:geo.4> ;
schema:name "Place 4" .
<urn:test.org:geo.4> rdf:type schema:GeoCoordinates ;
schema:longitude "17"^^xsd:double ;
schema:elevation "123"^^xsd:int .
{code}
Important is that `geo.1` and `geo.2` do have all latitude, longitude and
elevation defined. `geo.3` has no elevation and `geo.4` is missing the latitude
to simulate invalid geo coordinate data.
h2. Test Case 1
The following query using an OPTIONAL graph pattern including `schema:latitude`
and `schema:longitude`. This assumes a user just want lat/long values of
locations that do define both.
{code}
PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT * WHERE {
?entity schema:geo ?location
OPTIONAL {
?location schema:latitude ?lat .
?location schema:longitude ?long .
}
}
{code}
translate to the Algebra
{code}
(base <http://example/base/>
(prefix ((schema: <http://schema.org/>)
(rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
(leftjoin
(bgp (triple ?entity schema:geo ?location))
(bgp
(triple ?location schema:latitude ?lat)
(triple ?location schema:longitude ?long)
))))
{code}
The expected result are
{code}
entity,location,lat,long
urn:test.org:place.1,urn:test.org:geo.1,16,17
urn:test.org:place.2,urn:test.org:geo.2,15,16
urn:test.org:place.3,urn:test.org:geo.3,15,17
urn:test.org:place.4,urn:test.org:geo.4,,
{code}
All four locations are expected in the result set as the `OPTIONAL` graph
pattern is translated to a `leftjoin` with `triple ?entity schema:geo
?location`.
However for `geo.4` no value is expected for `?lat` AND `long` as this resource
only defines a longitude and therefore does not match
{code}
(bgp
(triple ?location schema:latitude ?lat)
(triple ?location schema:longitude ?long)
)
{code}
Marmotta responses with
{code}
entity,location,lat,long
urn:test.org:place.1,urn:test.org:geo.1,16,17
urn:test.org:place.2,urn:test.org:geo.2,15,16
urn:test.org:place.3,urn:test.org:geo.3,15,17
urn:test.org:place.4,urn:test.org:geo.4,,17
{code}
Note that the longitude is returned for the resource `geo.4`
h2. Test Case 2
As a variation we now also include the `schema:elevation` in the OPTIONAL graph
pattern.
{code}
PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT * WHERE {
?entity schema:geo ?location
OPTIONAL {
?location schema:latitude ?lat .
?location schema:longitude ?long .
?location schema:elevation ?alt .
}
}
{code}
This query translates to the following algebra
{code}
(base <http://example/base/>
(prefix ((schema: <http://schema.org/>)
(rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
(leftjoin
(bgp (triple ?entity schema:geo ?location))
(bgp
(triple ?location schema:latitude ?lat)
(triple ?location schema:longitude ?long)
(triple ?location schema:elevation ?alt)
))))
{code}
The expected result would have 4 result rows where `lat`, `long` and `alt`
values are only provided for `geo.1` and `geo.2`.
{code}
entity,location,lat,long,alt
urn:test.org:place.1,urn:test.org:geo.1,16,17,123
urn:test.org:place.2,urn:test.org:geo.2,15,16,99
urn:test.org:place.3,urn:test.org:geo.3,,,
urn:test.org:place.4,urn:test.org:geo.4,,,
{code}
With this query Marmotta behaves very strange as the results depend on the
ordering of the tripple patterns in the `OPTIONAL` graph pattern. I will not
include all variations but just provide two examples:
{code}
OPTIONAL {
?location schema:latitude ?lat .
?location schema:longitude ?long .
?location schema:elevation ?alt .
}
{code}
gives
{code}
entity,location,lat,long,alt
urn:test.org:place.1,urn:test.org:geo.1,1.6E1,1.7E1,123
urn:test.org:place.2,urn:test.org:geo.2,1.5E1,1.6E1,99
urn:test.org:place.4,urn:test.org:geo.4,,1.7E1,123
{code}
while
{code}
OPTIONAL {
?location schema:longitude ?long .
?location schema:latitude ?lat .
?location schema:elevation ?alt .
}
{code}
gives
{code}
entity,location,long,lat,alt
urn:test.org:place.1,urn:test.org:geo.1,1.7E1,1.6E1,123
urn:test.org:place.2,urn:test.org:geo.2,1.6E1,1.5E1,99
{code}
This behavior further indicates that `OPTIONAL` are wrongly processed.
h2. Test Case 3
Modifying the query to
{code}
PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT * WHERE {
?entity schema:geo ?location
OPTIONAL {
?location schema:latitude ?lat .
?location schema:longitude ?long .
}
OPTIONAL {
?location schema:elevation ?alt .
}
}
{code}
results in a similar result to _Test Case 1_ where we have 4 results, but for
`geo.4` we do get the unexpected value for `?long`.
h2. Test Case 4
This test case assumes that the user requires `lat` and `long` and optionally
wants the `alt` but only for resources that do have a valid location.
{code}
PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT * WHERE {
?entity schema:geo ?location
OPTIONAL {
?location schema:latitude ?lat .
?location schema:longitude ?long .
OPTIONAL {
?location schema:elevation ?alt .
}
}
}
{code}
This translates to the following algebra
{code}
(base <http://example/base/>
(prefix ((schema: <http://schema.org/>)
(rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
(leftjoin
(bgp (triple ?entity schema:geo ?location))
(leftjoin
(bgp
(triple ?location schema:latitude ?lat)
(triple ?location schema:longitude ?long)
)
(bgp (triple ?location schema:elevation ?alt))))))
{code}
So `lat` and `long` values are `leftjoin` with the `alt`. Than the result is in
an other `leftjoin` with the results of `?entity schema:geo ?location`. Because
expected results are as follows
{code}
entity,location,lat,long,alt
urn:test.org:place.1,urn:test.org:geo.1,16,17,123
urn:test.org:place.2,urn:test.org:geo.2,15,16,99
urn:test.org:place.3,urn:test.org:geo.3,,,
urn:test.org:place.4,urn:test.org:geo.4,,,
{code}
Marmotta however returns
{code}
entity,location,lat,long,alt
urn:test.org:place.1,urn:test.org:geo.1,16,17,123
urn:test.org:place.2,urn:test.org:geo.2,15,16,99
urn:test.org:place.3,urn:test.org:geo.3,15,17,
urn:test.org:place.4,urn:test.org:geo.4,,17,123
{code}
All test cases show that OPTIONAL query segments are not correctly evaluated by
the SPARQL implementation of the KiWi triple store.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)