[ 
https://issues.apache.org/jira/browse/JENA-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-1128.
---------------------------------
       Resolution: Fixed
    Fix Version/s: Jena 3.1.0

> sdbquery doesn't work with MINUS
> --------------------------------
>
>                 Key: JENA-1128
>                 URL: https://issues.apache.org/jira/browse/JENA-1128
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: SDB
>    Affects Versions: Jena 3.0.1
>         Environment: jena-sdb-3.1.0-SNAPSHOT dated 2 Feb 2016
> apache-jena-3.1.0-SNAPSHOT dated 2 Feb 2016
>            Reporter: Osma Suominen
>            Assignee: Andy Seaborne
>             Fix For: Jena 3.1.0
>
>
> I'm running a SPARQL query against a SDB loaded with SKOS data. The intent of 
> the query is to check for broken links, i.e. skos:closeMatch relationships 
> that point to nonexistent concepts in another SKOS dataset. I have simplified 
> my query to a rather minimal test case below. In this case, also the remote 
> data is included in the same graph for simplicity.
> Here is my test data:
> {noformat}
> @prefix skos: <http://www.w3.org/2004/02/skos/core#>.
> @prefix local: <http://example.com/local/>.
> @prefix remote: <http://example.com/remote/>.
> local:conceptA a skos:Concept ;
>   skos:prefLabel "Local concept A"@en ;
>   skos:note "has a valid link to an existing remote concept" ;
>   skos:closeMatch remote:conceptC .
> local:conceptB a skos:Concept ;
>   skos:prefLabel "Local concept B"@en ;
>   skos:note "has a broken link to a nonexistent remote concept" ;
>   skos:closeMatch remote:conceptD .
> remote:conceptC a skos:Concept ;
>   skos:prefLabel "Remote concept C"@en .
> {noformat}
> This is my SPARQL query:
> {noformat}
> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
> SELECT * WHERE {
>   ?local skos:closeMatch ?remote .
>   MINUS { ?remote a skos:Concept }
> }
> {noformat}
> If I run the query using the command line tool "sparql" from the apache-jena 
> distribution, it returns the correct result, i.e. the one concept with the 
> broken link:
> {noformat}
> ------------------------------------------------------------------------------
> | local                               | remote                               |
> ==============================================================================
> | <http://example.com/local/conceptB> | <http://example.com/remote/conceptD> |
> ------------------------------------------------------------------------------
> {noformat}
> But when I load the above data into a SDB database (MySQL) and use sdbquery 
> with the same SPARQL query, I get a different result (here run with the 
> --debug option) which has an extra row:
> {noformat}
> PREFIX  skos: <http://www.w3.org/2004/02/skos/core#>
> SELECT  *
> WHERE
>   { ?local  skos:closeMatch  ?remote
>     MINUS
>       { ?remote  a                     skos:Concept }
>   }
> - - - - - - - - - - - - - -
> SELECT                                   -- V_3=?remote
>   R_3.lex AS V_3_lex, R_3.datatype AS V_3_datatype, R_3.lang AS V_3_lang, 
> R_3.type AS V_3_type
> FROM
>     ( SELECT                             -- ?remote:(T_2.s=>T_2.X_1)
>         T_2.s AS X_1
>       FROM Triples AS T_2                -- ?remote 
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> skos:Concept
>       WHERE ( T_2.p = -6430697865200335348 -- Const: 
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>          AND T_2.o = 1728757386496985884 -- Const: skos:Concept
>          )
>     ) AS T_2                             -- ?remote:(T_2.s=>T_2.X_1)
>   LEFT OUTER JOIN
>     Nodes AS R_3                         -- Var: ?remote
>   ON ( T_2.X_1 = R_3.hash )
> (minus
>   (SQL '''SqlSelectBlock/S_1             -- V_1=?local V_2=?remote
>         R_1.lex/V_1_lex R_1.datatype/V_1_datatype R_1.lang/V_1_lang 
> R_1.type/V_1_type 
>         R_2.lex/V_2_lex R_2.datatype/V_2_datatype R_2.lang/V_2_lang 
> R_2.type/V_2_type
>       Join/left outer
>         Join/left outer
>           SqlSelectBlock/T_1
>               T_1.p = 2699241716664962559
>             Table T_1                    -- ?local skos:closeMatch ?remote
>           Table R_1                      -- Var: ?local
>           Condition T_1.s = R_1.hash
>         Table R_2                        -- Var: ?remote
>         Condition T_1.o = R_2.hash''')
>   (SQL '''SqlSelectBlock/S_2             -- V_3=?remote
>         R_3.lex/V_3_lex R_3.datatype/V_3_datatype R_3.lang/V_3_lang 
> R_3.type/V_3_type
>       Join/left outer
>         SqlSelectBlock/T_2               -- ?remote:(T_2.s=>T_2.X_1)
>             T_2.s/X_1
>             T_2.p = -6430697865200335348
>             T_2.o = 1728757386496985884
>           Table T_2                      -- ?remote 
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> skos:Concept
>         Table R_3                        -- Var: ?remote
>         Condition T_2.X_1 = R_3.hash''')
> )
> SELECT                                   -- V_1=?local V_2=?remote
>   R_1.lex AS V_1_lex, R_1.datatype AS V_1_datatype, R_1.lang AS V_1_lang, 
> R_1.type AS V_1_type, 
>   R_2.lex AS V_2_lex, R_2.datatype AS V_2_datatype, R_2.lang AS V_2_lang, 
> R_2.type AS V_2_type
> FROM
>     ( SELECT *
>       FROM Triples AS T_1                -- ?local skos:closeMatch ?remote
>       WHERE ( T_1.p = 2699241716664962559 -- Const: skos:closeMatch
>          )
>     ) AS T_1
>   LEFT OUTER JOIN
>     Nodes AS R_1                         -- Var: ?local
>   ON ( T_1.s = R_1.hash )
>   LEFT OUTER JOIN
>     Nodes AS R_2                         -- Var: ?remote
>   ON ( T_1.o = R_2.hash )
> ------------------------------------------------------------------------------
> | local                               | remote                               |
> ==============================================================================
> | <http://example.com/local/conceptA> | <http://example.com/remote/conceptC> |
> | <http://example.com/local/conceptB> | <http://example.com/remote/conceptD> |
> ------------------------------------------------------------------------------
> {noformat}
> If I change the query to use FILTER NOT EXISTS instead of MINUS, then I get 
> the correct result also with sdbquery:
> {noformat}
> PREFIX  skos: <http://www.w3.org/2004/02/skos/core#>
> SELECT  *
> WHERE
>   { ?local  skos:closeMatch  ?remote
>     FILTER NOT EXISTS { ?remote  a                     skos:Concept }
>   }
> - - - - - - - - - - - - - -
> (filter (notexists
>            (SQL '''SqlSelectBlock/T_2    -- ?remote:(T_2.s=>T_2.X_1)
>                  T_2.s/X_1
>                  T_2.p = -6430697865200335348
>                  T_2.o = 1728757386496985884
>                Table T_2                 -- ?remote 
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> skos:Concept''')
>   )
>   (SQL '''SqlSelectBlock/S_1             -- V_1=?local V_2=?remote
>         R_1.lex/V_1_lex R_1.datatype/V_1_datatype R_1.lang/V_1_lang 
> R_1.type/V_1_type 
>         R_2.lex/V_2_lex R_2.datatype/V_2_datatype R_2.lang/V_2_lang 
> R_2.type/V_2_type
>       Join/left outer
>         Join/left outer
>           SqlSelectBlock/T_1
>               T_1.p = 2699241716664962559
>             Table T_1                    -- ?local skos:closeMatch ?remote
>           Table R_1                      -- Var: ?local
>           Condition T_1.s = R_1.hash
>         Table R_2                        -- Var: ?remote
>         Condition T_1.o = R_2.hash''')
> )
> SELECT                                   -- V_1=?local V_2=?remote
>   R_1.lex AS V_1_lex, R_1.datatype AS V_1_datatype, R_1.lang AS V_1_lang, 
> R_1.type AS V_1_type, 
>   R_2.lex AS V_2_lex, R_2.datatype AS V_2_datatype, R_2.lang AS V_2_lang, 
> R_2.type AS V_2_type
> FROM
>     ( SELECT *
>       FROM Triples AS T_1                -- ?local skos:closeMatch ?remote
>       WHERE ( T_1.p = 2699241716664962559 -- Const: skos:closeMatch
>          )
>     ) AS T_1
>   LEFT OUTER JOIN
>     Nodes AS R_1                         -- Var: ?local
>   ON ( T_1.s = R_1.hash )
>   LEFT OUTER JOIN
>     Nodes AS R_2                         -- Var: ?remote
>   ON ( T_1.o = R_2.hash )
> SELECT *
> FROM Triples AS T_3                      -- 
> <http://example.com/remote/conceptC> 
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> skos:Concept
> WHERE ( T_3.s = 5972767169237582230      -- Const: 
> <http://example.com/remote/conceptC>
>    AND T_3.p = -6430697865200335348      -- Const: 
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>    AND T_3.o = 1728757386496985884       -- Const: skos:Concept
>    )
> SELECT *
> FROM Triples AS T_4                      -- 
> <http://example.com/remote/conceptD> 
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> skos:Concept
> WHERE ( T_4.s = 8175828786801660008      -- Const: 
> <http://example.com/remote/conceptD>
>    AND T_4.p = -6430697865200335348      -- Const: 
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>    AND T_4.o = 1728757386496985884       -- Const: skos:Concept
>    )
> ------------------------------------------------------------------------------
> | local                               | remote                               |
> ==============================================================================
> | <http://example.com/local/conceptB> | <http://example.com/remote/conceptD> |
> ------------------------------------------------------------------------------
> {noformat}
> However, in my actual query that this example is based on 
> (https://github.com/NatLibFi/Finto-data/blob/master/tools/yso-updater-sparql/5-ysa-removed-concepts.rq)
>  using FILTER NOT EXISTS is not an efficient solution, because the subtracted 
> part uses a federated query and it will result in almost 30000 queries to be 
> performed to the remote endpoint instead of just one.
> I'm using the most recent snapshots available from repository.apache.org.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to