[jira] [Commented] (JENA-1652) jena-text analyzer regression

Osma Suominen (JIRA) Fri, 14 Dec 2018 10:03:49 -0800


    [ 
https://issues.apache.org/jira/browse/JENA-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721644#comment-16721644
 ]


Osma Suominen commented on JENA-1652:
-------------------------------------

I can think of many ways of fixing the original problem I found with Skosmos 
queries not working:

# Work around the problem on the Skosmos side, i.e. lowercase all query strings 
that are passed to jena-text. This is probably not a very good solution as 
other jena-text users may still be affected by the change.
# Change jena-text to lowercase the query string before it is passed to 
QueryParser.parse().
# Figure out some other way to force Lucene to normalize the query string. I 
tried to look for such a facility but couldn't find one. Lucene 6.4 had a 
[lowerCaseExpandedTerms|https://github.com/apache/lucene-solr/blob/branch_6_4/lucene/queryparser/src/java/org/apache/lucene/queryparser/classic/QueryParserBase.java#L60]
 setting that controlled this and defaulted to true. This setting no longer 
exists in 7.4+.
# Report this as a Lucene bug.
# Some combination of the above.

Any ideas?

> jena-text analyzer regression
> -----------------------------
>
>                 Key: JENA-1652
>                 URL: https://issues.apache.org/jira/browse/JENA-1652
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Text
>    Affects Versions: Jena 3.10.0
>         Environment: Ubuntu 16.04
> java version "1.8.0_191"
> Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
> Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
>            Reporter: Osma Suominen
>            Assignee: Osma Suominen
>            Priority: Major
>             Fix For: Jena 3.10.0
>
>
> I noticed that Skosmos unit tests are failing when run with Fuseki 3.10 
> snapshots:
> https://github.com/NatLibFi/Skosmos/issues/828
> Digging a bit deeper, it seems that jena-text is no longer applying the 
> analyzer on query strings as it used to in 3.9.0. The most likely reason for 
> this change seems to be the Lucene upgrade (JENA-1621) which may have 
> affected how analyzers are applied.
> Here is the text analyzer configuration I'm using:
> {noformat}
> <#indexLucene> a text:TextIndexLucene ;
>     ##text:directory <file:/tmp/lucene> ;
>     text:directory "mem" ;
>     text:entityMap <#entMap> ;
>     text:storeValues true ;
>     .
> <#entMap> a text:EntityMap ;
>     text:entityField      "uri" ;
>     text:graphField       "graph" ; ## enable graph-specific indexing
>     text:defaultField     "pref" ; ## Must be defined in the text:map
>     text:uidField         "uid" ;
>     text:langField        "lang" ;
>     text:map (
>          # skos:prefLabel
>          [ text:field "pref" ;
>            text:predicate skos:prefLabel ;
>            text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
>          # skos:altLabel
>          [ text:field "alt" ;
>            text:predicate skos:altLabel ;
>            text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
>          # skos:hiddenLabel
>          [ text:field "hidden" ;
>            text:predicate skos:hiddenLabel ;
>            text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
>          ) .
> {noformat}
> Here is a minimal test file that I load into the default graph:
> {noformat}
> <http://example.org/guppy> <http://www.w3.org/2004/02/skos/core#prefLabel> 
> "Guppy"@en-gb .
> {noformat}
> This is the query I'm using:
> {noformat}
> PREFIX text: <http://jena.apache.org/text#>
> SELECT * {
>   ?s text:query 'G*' .
> }
> {noformat}
> It returns one row (?s=<http://example.org/guppy>) on Fuseki 3.9.0 but 
> nothing with today's 3.10 snapshot.
> If I change the 'G*' to lowercase 'g*' then I get the expected match also 
> with the 3.10 snapshot. So the analyzer (which should lowercase everything 
> and thus the case of the query string should be irrelevant) seems not to be 
> applied for the query string.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (JENA-1652) jena-text analyzer regression

Reply via email to