[ 
https://issues.apache.org/jira/browse/MARMOTTA-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dietmar Glachs reassigned MARMOTTA-603:
---------------------------------------

    Assignee: Dietmar Glachs

> SPARQL OPTIONAL issues
> ----------------------
>
>                 Key: MARMOTTA-603
>                 URL: https://issues.apache.org/jira/browse/MARMOTTA-603
>             Project: Marmotta
>          Issue Type: Bug
>          Components: KiWi Triple Store
>    Affects Versions: 3.3.0
>            Reporter: Rupert Westenthaler
>            Assignee: Dietmar Glachs
>            Priority: Critical
>
> The SPARQL implemenation of the KiWi triple store seams to have issues with 
> the evaluation of OPTIONAL segments of SPARQL queries. In the following test 
> data and test queries are provided.
> h2. Data
> {code}
>       <urn:test.org:place.1> rdf:type schema:Palce ;
>               schema:geo <urn:test.org:geo.1> ;
>               schema:name "Place 1" .
>       <urn:test.org:geo.1> rdf:type schema:GeoCoordinates ;
>               schema:latitude "16"^^xsd:double ;
>               schema:longitude "17"^^xsd:double ;
>               schema:elevation "123"^^xsd:int .
>       <urn:test.org:place.2> rdf:type schema:Palce ;
>               schema:geo <urn:test.org:geo.2> ;
>               schema:name "Place 2" .
>       <urn:test.org:geo.2> rdf:type schema:GeoCoordinates ;
>               schema:latitude "15"^^xsd:double ;
>               schema:longitude "16"^^xsd:double ;
>               schema:elevation "99"^^xsd:int .
>       <urn:test.org:place.3> rdf:type schema:Palce ;
>               schema:geo <urn:test.org:geo.3> ;
>               schema:name "Place 3" .
>       <urn:test.org:geo.3> rdf:type schema:GeoCoordinates ;
>               schema:latitude "15"^^xsd:double ;
>               schema:longitude "17"^^xsd:double .
>       <urn:test.org:place.4> rdf:type schema:Palce ;
>               schema:geo <urn:test.org:geo.4> ;
>               schema:name "Place 4" .
>       <urn:test.org:geo.4> rdf:type schema:GeoCoordinates ;
>               schema:longitude "17"^^xsd:double ;
>               schema:elevation "123"^^xsd:int .
> {code}
> Important is that `geo.1` and `geo.2` do have all latitude, longitude and 
> elevation defined. `geo.3` has no elevation and `geo.4` is missing the 
> latitude to simulate invalid geo coordinate data.
> h2. Test Case 1
> The following query using an OPTIONAL graph pattern including 
> `schema:latitude` and `schema:longitude`. This assumes a user just want 
> lat/long values of locations that do define both.
> {code}
>     PREFIX schema: <http://schema.org/>
>     PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>     SELECT * WHERE {
>         ?entity schema:geo ?location
>         OPTIONAL {
>             ?location schema:latitude ?lat .
>             ?location    schema:longitude ?long .
>         }
>     }
> {code}
> translate to the Algebra
> {code}
>     (base <http://example/base/>
>         (prefix ((schema: <http://schema.org/>)
>                 (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
>             (leftjoin
>             (bgp (triple ?entity schema:geo ?location))
>             (bgp
>                 (triple ?location schema:latitude ?lat)
>                 (triple ?location schema:longitude ?long)
>              ))))
> {code}
> The expected result are 
> {code}
>     entity,location,lat,long
>     urn:test.org:place.1,urn:test.org:geo.1,16,17
>     urn:test.org:place.2,urn:test.org:geo.2,15,16
>     urn:test.org:place.3,urn:test.org:geo.3,15,17
>     urn:test.org:place.4,urn:test.org:geo.4,,
> {code}
> All four locations are expected in the result set as the `OPTIONAL` graph 
> pattern is translated to a `leftjoin` with `triple ?entity schema:geo 
> ?location`.
> However for `geo.4` no value is expected for `?lat` AND `long` as this 
> resource only defines a longitude and therefore does not match
> {code}
>     (bgp
>         (triple ?location schema:latitude ?lat)
>         (triple ?location schema:longitude ?long)
>     )
> {code}
> Marmotta responses with 
> {code}
>     entity,location,lat,long
>     urn:test.org:place.1,urn:test.org:geo.1,16,17
>     urn:test.org:place.2,urn:test.org:geo.2,15,16
>     urn:test.org:place.3,urn:test.org:geo.3,15,17
>     urn:test.org:place.4,urn:test.org:geo.4,,17
> {code}
> Note that the longitude is returned for the resource `geo.4`
> h2. Test Case 2
> As a variation we now also include the `schema:elevation` in the OPTIONAL 
> graph pattern.
> {code}
>     PREFIX schema: <http://schema.org/>
>     PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>     SELECT * WHERE {
>         ?entity schema:geo ?location
>         OPTIONAL {
>                   ?location schema:latitude ?lat .
>             ?location schema:longitude ?long .
>             ?location schema:elevation ?alt .
>         }
>     }
> {code}
> This query translates to the following algebra
> {code}
>     (base <http://example/base/>
>         (prefix ((schema: <http://schema.org/>)
>                    (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
>             (leftjoin
>             (bgp (triple ?entity schema:geo ?location))
>             (bgp
>                 (triple ?location schema:latitude ?lat)
>                 (triple ?location schema:longitude ?long)
>                 (triple ?location schema:elevation ?alt)
>             ))))
> {code}
> The expected result would have 4 result rows where `lat`, `long` and `alt` 
> values are only provided for `geo.1` and `geo.2`.
> {code}
>     entity,location,lat,long,alt
>     urn:test.org:place.1,urn:test.org:geo.1,16,17,123
>     urn:test.org:place.2,urn:test.org:geo.2,15,16,99
>     urn:test.org:place.3,urn:test.org:geo.3,,,
>     urn:test.org:place.4,urn:test.org:geo.4,,,
> {code}
> With this query Marmotta behaves very strange as the results depend on the 
> ordering of the  tripple patterns in the `OPTIONAL` graph pattern. I will not 
> include all variations but just provide two examples:
> {code}
>         OPTIONAL {
>                   ?location schema:latitude ?lat .
>             ?location schema:longitude ?long .
>             ?location schema:elevation ?alt .
>         }
> {code}
> gives
> {code}
>     entity,location,lat,long,alt
>     urn:test.org:place.1,urn:test.org:geo.1,1.6E1,1.7E1,123
>     urn:test.org:place.2,urn:test.org:geo.2,1.5E1,1.6E1,99
>     urn:test.org:place.4,urn:test.org:geo.4,,1.7E1,123
> {code}
> while
> {code}
>         OPTIONAL {
>             ?location schema:longitude ?long .
>                   ?location schema:latitude ?lat .
>             ?location schema:elevation ?alt .
>         }
> {code}
> gives
> {code}
>     entity,location,long,lat,alt
>     urn:test.org:place.1,urn:test.org:geo.1,1.7E1,1.6E1,123
>     urn:test.org:place.2,urn:test.org:geo.2,1.6E1,1.5E1,99
> {code}
> This behavior further indicates that `OPTIONAL` are wrongly processed.
> h2. Test Case 3
> Modifying the query to 
> {code}
>     PREFIX schema: <http://schema.org/>
>     PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>     SELECT * WHERE {
>         ?entity schema:geo ?location
>         OPTIONAL {
>                   ?location schema:latitude ?lat .
>             ?location schema:longitude ?long .
>         }
>         OPTIONAL {
>             ?location schema:elevation ?alt .
>         }
>     }
> {code}
> results in a similar result to _Test Case 1_ where we have 4 results, but for 
> `geo.4` we do get the unexpected value for `?long`.
> h2. Test Case 4
> This test case assumes that the user requires `lat` and `long` and optionally 
> wants the `alt` but only for resources that do have a valid location.
> {code}
>     PREFIX schema: <http://schema.org/>
>     PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>     SELECT * WHERE {
>         ?entity schema:geo ?location
>         OPTIONAL {
>                   ?location schema:latitude ?lat .
>             ?location schema:longitude ?long .
>             OPTIONAL {
>                 ?location schema:elevation ?alt .
>             }
>         }
>     }
> {code}
> This translates to the following algebra
> {code}
>     (base <http://example/base/>
>         (prefix ((schema: <http://schema.org/>)
>                    (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
>             (leftjoin
>                 (bgp (triple ?entity schema:geo ?location))
>                 (leftjoin
>                     (bgp
>                         (triple ?location schema:latitude ?lat)
>                         (triple ?location schema:longitude ?long)
>                     )
>                         (bgp (triple ?location schema:elevation ?alt))))))
> {code}
> So `lat` and `long` values are `leftjoin` with the `alt`. Than the result is 
> in an other `leftjoin` with the results of `?entity schema:geo ?location`. 
> Because expected results are as follows
> {code}
>     entity,location,lat,long,alt
>     urn:test.org:place.1,urn:test.org:geo.1,16,17,123
>     urn:test.org:place.2,urn:test.org:geo.2,15,16,99
>     urn:test.org:place.3,urn:test.org:geo.3,,,
>     urn:test.org:place.4,urn:test.org:geo.4,,,
> {code}
> Marmotta however returns
> {code}
>     entity,location,lat,long,alt
>     urn:test.org:place.1,urn:test.org:geo.1,16,17,123
>     urn:test.org:place.2,urn:test.org:geo.2,15,16,99
>     urn:test.org:place.3,urn:test.org:geo.3,15,17,
>     urn:test.org:place.4,urn:test.org:geo.4,,17,123
> {code}
> All test cases show that OPTIONAL query segments are not correctly evaluated 
> by the SPARQL implementation of the KiWi triple store.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to