Rupert Westenthaler created MARMOTTA-603:
--------------------------------------------

             Summary: SPARQL OPTIONAL issues
                 Key: MARMOTTA-603
                 URL: https://issues.apache.org/jira/browse/MARMOTTA-603
             Project: Marmotta
          Issue Type: Bug
          Components: KiWi Triple Store
    Affects Versions: 3.3.0
            Reporter: Rupert Westenthaler
            Priority: Critical


The SPARQL implemenation of the KiWi triple store seams to have issues with the 
evaluation of OPTIONAL segments of SPARQL queries. In the following test data 
and test queries are provided.

h2. Data

{code}
        <urn:test.org:place.1> rdf:type schema:Palce ;
                schema:geo <urn:test.org:geo.1> ;
                schema:name "Place 1" .

        <urn:test.org:geo.1> rdf:type schema:GeoCoordinates ;
                schema:latitude "16"^^xsd:double ;
                schema:longitude "17"^^xsd:double ;
                schema:elevation "123"^^xsd:int .

        <urn:test.org:place.2> rdf:type schema:Palce ;
                schema:geo <urn:test.org:geo.2> ;
                schema:name "Place 2" .

        <urn:test.org:geo.2> rdf:type schema:GeoCoordinates ;
                schema:latitude "15"^^xsd:double ;
                schema:longitude "16"^^xsd:double ;
                schema:elevation "99"^^xsd:int .

        <urn:test.org:place.3> rdf:type schema:Palce ;
                schema:geo <urn:test.org:geo.3> ;
                schema:name "Place 3" .

        <urn:test.org:geo.3> rdf:type schema:GeoCoordinates ;
                schema:latitude "15"^^xsd:double ;
                schema:longitude "17"^^xsd:double .

        <urn:test.org:place.4> rdf:type schema:Palce ;
                schema:geo <urn:test.org:geo.4> ;
                schema:name "Place 4" .

        <urn:test.org:geo.4> rdf:type schema:GeoCoordinates ;
                schema:longitude "17"^^xsd:double ;
                schema:elevation "123"^^xsd:int .
{code}

Important is that `geo.1` and `geo.2` do have all latitude, longitude and 
elevation defined. `geo.3` has no elevation and `geo.4` is missing the latitude 
to simulate invalid geo coordinate data.

h2. Test Case 1

The following query using an OPTIONAL graph pattern including `schema:latitude` 
and `schema:longitude`. This assumes a user just want lat/long values of 
locations that do define both.

{code}
    PREFIX schema: <http://schema.org/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    SELECT * WHERE {
        ?entity schema:geo ?location
        OPTIONAL {
            ?location schema:latitude ?lat .
            ?location    schema:longitude ?long .
        }
    }
{code}

translate to the Algebra

{code}
    (base <http://example/base/>
        (prefix ((schema: <http://schema.org/>)
                (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
            (leftjoin
            (bgp (triple ?entity schema:geo ?location))
            (bgp
                (triple ?location schema:latitude ?lat)
                (triple ?location schema:longitude ?long)
             ))))
{code}

The expected result are 

{code}
    entity,location,lat,long
    urn:test.org:place.1,urn:test.org:geo.1,16,17
    urn:test.org:place.2,urn:test.org:geo.2,15,16
    urn:test.org:place.3,urn:test.org:geo.3,15,17
    urn:test.org:place.4,urn:test.org:geo.4,,
{code}

All four locations are expected in the result set as the `OPTIONAL` graph 
pattern is translated to a `leftjoin` with `triple ?entity schema:geo 
?location`.

However for `geo.4` no value is expected for `?lat` AND `long` as this resource 
only defines a longitude and therefore does not match

{code}
    (bgp
        (triple ?location schema:latitude ?lat)
        (triple ?location schema:longitude ?long)
    )
{code}

Marmotta responses with 

{code}
    entity,location,lat,long
    urn:test.org:place.1,urn:test.org:geo.1,16,17
    urn:test.org:place.2,urn:test.org:geo.2,15,16
    urn:test.org:place.3,urn:test.org:geo.3,15,17
    urn:test.org:place.4,urn:test.org:geo.4,,17
{code}

Note that the longitude is returned for the resource `geo.4`

h2. Test Case 2

As a variation we now also include the `schema:elevation` in the OPTIONAL graph 
pattern.

{code}
    PREFIX schema: <http://schema.org/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    SELECT * WHERE {
        ?entity schema:geo ?location
        OPTIONAL {
            ?location schema:latitude ?lat .
            ?location schema:longitude ?long .
            ?location schema:elevation ?alt .
        }
    }
{code}

This query translates to the following algebra

{code}
    (base <http://example/base/>
        (prefix ((schema: <http://schema.org/>)
                   (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
            (leftjoin
            (bgp (triple ?entity schema:geo ?location))
            (bgp
                (triple ?location schema:latitude ?lat)
                (triple ?location schema:longitude ?long)
                (triple ?location schema:elevation ?alt)
            ))))
{code}

The expected result would have 4 result rows where `lat`, `long` and `alt` 
values are only provided for `geo.1` and `geo.2`.

{code}
    entity,location,lat,long,alt
    urn:test.org:place.1,urn:test.org:geo.1,16,17,123
    urn:test.org:place.2,urn:test.org:geo.2,15,16,99
    urn:test.org:place.3,urn:test.org:geo.3,,,
    urn:test.org:place.4,urn:test.org:geo.4,,,
{code}

With this query Marmotta behaves very strange as the results depend on the 
ordering of the  tripple patterns in the `OPTIONAL` graph pattern. I will not 
include all variations but just provide two examples:

{code}
        OPTIONAL {
            ?location schema:latitude ?lat .
            ?location schema:longitude ?long .
            ?location schema:elevation ?alt .
        }
{code}

gives

{code}
    entity,location,lat,long,alt
    urn:test.org:place.1,urn:test.org:geo.1,1.6E1,1.7E1,123
    urn:test.org:place.2,urn:test.org:geo.2,1.5E1,1.6E1,99
    urn:test.org:place.4,urn:test.org:geo.4,,1.7E1,123
{code}

while

{code}
        OPTIONAL {
            ?location schema:longitude ?long .
            ?location schema:latitude ?lat .
            ?location schema:elevation ?alt .
        }
{code}

gives

{code}
    entity,location,long,lat,alt
    urn:test.org:place.1,urn:test.org:geo.1,1.7E1,1.6E1,123
    urn:test.org:place.2,urn:test.org:geo.2,1.6E1,1.5E1,99
{code}

This behavior further indicates that `OPTIONAL` are wrongly processed.

h2. Test Case 3

Modifying the query to 

{code}
    PREFIX schema: <http://schema.org/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    SELECT * WHERE {
        ?entity schema:geo ?location
        OPTIONAL {
            ?location schema:latitude ?lat .
            ?location schema:longitude ?long .
        }
        OPTIONAL {
            ?location schema:elevation ?alt .
        }
    }
{code}

results in a similar result to _Test Case 1_ where we have 4 results, but for 
`geo.4` we do get the unexpected value for `?long`.

h2. Test Case 4

This test case assumes that the user requires `lat` and `long` and optionally 
wants the `alt` but only for resources that do have a valid location.

{code}
    PREFIX schema: <http://schema.org/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    SELECT * WHERE {
        ?entity schema:geo ?location
        OPTIONAL {
            ?location schema:latitude ?lat .
            ?location schema:longitude ?long .
            OPTIONAL {
                ?location schema:elevation ?alt .
            }
        }
    }
{code}

This translates to the following algebra

{code}
    (base <http://example/base/>
        (prefix ((schema: <http://schema.org/>)
                   (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
            (leftjoin
                (bgp (triple ?entity schema:geo ?location))
                (leftjoin
                    (bgp
                        (triple ?location schema:latitude ?lat)
                        (triple ?location schema:longitude ?long)
                    )
                        (bgp (triple ?location schema:elevation ?alt))))))
{code}

So `lat` and `long` values are `leftjoin` with the `alt`. Than the result is in 
an other `leftjoin` with the results of `?entity schema:geo ?location`. Because 
expected results are as follows

{code}
    entity,location,lat,long,alt
    urn:test.org:place.1,urn:test.org:geo.1,16,17,123
    urn:test.org:place.2,urn:test.org:geo.2,15,16,99
    urn:test.org:place.3,urn:test.org:geo.3,,,
    urn:test.org:place.4,urn:test.org:geo.4,,,
{code}

Marmotta however returns

{code}
    entity,location,lat,long,alt
    urn:test.org:place.1,urn:test.org:geo.1,16,17,123
    urn:test.org:place.2,urn:test.org:geo.2,15,16,99
    urn:test.org:place.3,urn:test.org:geo.3,15,17,
    urn:test.org:place.4,urn:test.org:geo.4,,17,123
{code}

All test cases show that OPTIONAL query segments are not correctly evaluated by 
the SPARQL implementation of the KiWi triple store.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to