[ 
https://issues.apache.org/jira/browse/JENA-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266189#comment-15266189
 ] 

Osma Suominen commented on JENA-1172:
-------------------------------------

There are basically two ways to address this:
1. Prevent blank nodes from being indexed with jena-text
2. Add real support for blank nodes

1 is trivial (though it won't fix older indexes that have already been tainted 
by blank nodes).
2 would require a bit more work since not only URIs, but also some kind of 
internal identifiers would need to be stored in the text index.

Do blank nodes have such an identifier that could be used instead of URI in the 
text index?

> blank nodes can break jena-text
> -------------------------------
>
>                 Key: JENA-1172
>                 URL: https://issues.apache.org/jira/browse/JENA-1172
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Text
>    Affects Versions: Fuseki 2.3.1
>            Reporter: Osma Suominen
>            Assignee: Osma Suominen
>
> Data with blank node subjects can break the jena-text index.
> For this example I use a typical jena-text configuration which indexes 
> rdfs:label. Then I add this triple:
> {noformat}
> _:b0 <http://www.w3.org/2000/01/rdf-schema#label> "blank" .
> {noformat}
> There is no error (though I remember seeing WARNINGs in other situations like 
> this) and the triple gets indexed.
> When I later execute this query:
> {noformat}
> PREFIX text: <http://jena.apache.org/text#>
> SELECT ?s { ?s text:query 'blank' }
> {noformat}
> I get this error:
> {noformat}
> 10:22:38 WARN  [5] RC = 500 : java.lang.UnsupportedOperationException: 
> 3ed87b7f14f612ef53788d889f6410d6 is not a URI node
> org.apache.jena.ext.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.UnsupportedOperationException: 3ed87b7f14f612ef53788d889f6410d6 is 
> not a URI node
>       at 
> org.apache.jena.ext.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
>       at 
> org.apache.jena.ext.com.google.common.cache.LocalCache.get(LocalCache.java:3937)
>       at 
> org.apache.jena.ext.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
>       at 
> org.apache.jena.atlas.lib.cache.CacheGuava.getOrFill(CacheGuava.java:58)
>       at org.apache.jena.query.text.TextQueryPF.query(TextQueryPF.java:291)
>       at 
> org.apache.jena.query.text.TextQueryPF.variableSubject(TextQueryPF.java:229)
>       at org.apache.jena.query.text.TextQueryPF.exec(TextQueryPF.java:198)
>       at 
> org.apache.jena.sparql.pfunction.PropertyFunctionBase$RepeatApplyIteratorPF.nextStage(PropertyFunctionBase.java:106)
> {noformat}
> Note that this happens any time the jena-text query happens to match a blank 
> node subject. So a single triple with a blank node subject can "taint" the 
> whole index. This is what happens with LCSH, which for whatever reason 
> happens to contain a few hundred blank nodes that have a skos:prefLabel 
> property (among almost 8M triples that generally use URIs for everything).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to