[jira] [Commented] (JENA-1187) Wrong results/performance regression when using BIND and graph pattern groups

JIRA Tue, 31 May 2016 07:01:42 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307763#comment-15307763
 ]


Eetu Mäkelä commented on JENA-1187:
-----------------------------------

Any data with rdf:type statements should produce the error. The query presented 
above is just a distilled down sample.

The actual queries are much more complex, with multiple filters, binds and so 
on. Here's an example (against http://ldf.fi/ancore/sparql, which contains some 
4 million triples):
{{code}}
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdve3: <http://rdvocab.info/ElementsGr3/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX oac: <http://www.openannotation.org/ns/>
PREFIX text: <http://jena.apache.org/text#>
SELECT ?id (?mngram AS ?label) ?ngram ?source {
  {
    VALUES ?ngram {
      "lots" "of" "ngrams" "lots of ngrams"
    }
    FILTER(STRLEN(?ngram)>2 && REGEX(?ngram,"^\\w\\. "))
    BIND(REPLACE(?ngram,"#\\d\\b","") AS ?ngram2)
    BIND(REPLACE(?ngram2,"^(\\w)\\. ","$1* ") AS ?ngram3)
    BIND(REPLACE(?ngram3,"\\b(\\w)","+$1") AS ?q)
    ?id text:query (?q 3) .
    ?id rdfs:label ?mngram .
    FILTER(STRSTARTS(?mngram,SUBSTR(?ngram2,1,1)))
    FILTER(CONTAINS(SUBSTR(?mngram,2),SUBSTR(?ngram2,4)))
    BIND(4 AS ?source)
  } UNION {
    VALUES ?ngram {
      "lots" "of" "ngrams" "lots of ngrams"
    }
    FILTER(STRLEN(?ngram)>2)
    BIND(REPLACE(?ngram,"#\\d\\b","") AS ?ngram2)
    BIND(REPLACE(?ngram2,"\\W$","") AS ?ngram3)
    {
      BIND(?ngram3 AS ?mngram)
      ?c dct:title|skos:prefLabel|rdfs:label|skos:altLabel|foaf:name ?mngram .
      BIND(1 AS ?source)    
    } UNION {
      BIND(STRLANG(CONCAT(UCASE(SUBSTR(?ngram3,1,1)),SUBSTR(?ngram3,2)),"la") 
AS ?mngram)
      ?c rdfs:label ?mngram .
      BIND(3 AS ?source)
    }
    ?c 
(owl:sameAs|^owl:sameAs|skos:exactMatch|^skos:exactMatch|foaf:primaryTopicOf|^foaf:primaryTopicOf)*
 ?id .
    FILTER NOT EXISTS {
      ?id a oac:Annotation .
    }
  } UNION {
        SERVICE <http://ldf.fi/dbpedia/sparql> {
          SELECT ?mngram ?ngram ?id {
            VALUES ?ngram {
      "lots" "of" "ngrams" "lots of ngrams"
        }
        FILTER(STRLEN(?ngram)>2 && UCASE(SUBSTR(?ngram,1,1))=SUBSTR(?ngram,1,1))
        BIND(REPLACE(?ngram,"#\\d\\b","") AS ?ngram2)
        BIND(REPLACE(?ngram2,"\\W$","") AS ?ngram3)
        BIND(STRLANG(?ngram3,"en") AS ?mngram)
                ?c rdfs:label ?mngram .
                FILTER(STRSTARTS(STR(?c),"http://dbpedia.org/resource/";))
                
FILTER(!STRSTARTS(STR(?c),"http://dbpedia.org/resource/Category:";))
                FILTER EXISTS { ?c a ?type }
                FILTER NOT EXISTS {
                  ?c dbo:wikiPageDisambiguates ?other .
                }
                FILTER NOT EXISTS {
                  ?c a dbo:Album .
                }
                FILTER NOT EXISTS {
                  ?c a dbo:MusicalWork .
                }
        FILTER NOT EXISTS {
          ?c a dbo:Year .
        }
        FILTER NOT EXISTS {
          ?c a dbo:Company .
        }
                {
                  ?c dbo:wikiPageRedirects ?id .
                } UNION {
                  FILTER NOT EXISTS {
                    ?c dbo:wikiPageRedirects ?other .
                  }
                  BIND(?c as ?id)
            }
      }
    }
    BIND(2 AS ?source)
  }
}
{{code}}

Looking at the above query, the issue is also that while in theory I could just 
push the VALUES, filters and so on down into the UNIONs in most cases, because 
sometimes I push down hundreds of ngrams at a time, the queries would start to 
grow in pure characters quite quickly.

> Wrong results/performance regression when using BIND and graph pattern groups
> -----------------------------------------------------------------------------
>
>                 Key: JENA-1187
>                 URL: https://issues.apache.org/jira/browse/JENA-1187
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 3.1.0, Fuseki 2.4.0
>            Reporter: Eetu Mäkelä
>
> I've been using SPARQL queries with BINDs and/or VALUESs that precede UNION 
> blocks. These used to work efficiently (at least in Jena 2 times) with the 
> bound value being bound also in the subpattern, but no longer do. In 
> addition, at certain times, they produce nonsensical results. For example, 
> the below returns {{rdf:type}} statements in the dataset without any regard 
> to ?cl:
> {code}
> SELECT * {
>   {
>     BIND("nonexistant" AS ?cl)
>     {
>       BIND(?cl AS ?cl2)
>       ?c a ?cl2 .
>     } UNION {
>       BIND(?cl AS ?cl2)
>       ?c a ?cl2 .
>     }
>   }
> }
> LIMIT 10
> {code}
> while if you change _just one_ of the subpatterns in the union to directly 
> refer to ?cl, it returns an empty result set:
> {code}
> SELECT * {
>   {
>     BIND("nonexistant" AS ?cl)
>     {
>       BIND(?cl AS ?cl2)
>       ?c a ?cl .
>     } UNION {
>       BIND(?cl AS ?cl2)
>       ?c a ?cl2 .
>     }
>   }
> }
> LIMIT 10
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (JENA-1187) Wrong results/performance regression when using BIND and graph pattern groups

Reply via email to