Osma Suominen created JENA-1652:
-----------------------------------

             Summary: jena-text analyzer regression
                 Key: JENA-1652
                 URL: https://issues.apache.org/jira/browse/JENA-1652
             Project: Apache Jena
          Issue Type: Bug
          Components: Text
    Affects Versions: Jena 3.10.0
         Environment: Ubuntu 16.04
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

            Reporter: Osma Suominen
            Assignee: Osma Suominen
             Fix For: Jena 3.10.0


I noticed that Skosmos unit tests are failing when run with Fuseki 3.10 
snapshots:
https://github.com/NatLibFi/Skosmos/issues/828

Digging a bit deeper, it seems that jena-text is no longer applying the 
analyzer on query strings as it used to in 3.9.0. The most likely reason for 
this change seems to be the Lucene upgrade (JENA-1621) which may have affected 
how analyzers are applied.

Here is the text analyzer configuration I'm using:

{noformat}
<#indexLucene> a text:TextIndexLucene ;
    ##text:directory <file:/tmp/lucene> ;
    text:directory "mem" ;
    text:entityMap <#entMap> ;
    text:storeValues true ;
    .

<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:graphField       "graph" ; ## enable graph-specific indexing
    text:defaultField     "pref" ; ## Must be defined in the text:map
    text:uidField         "uid" ;
    text:langField        "lang" ;
    text:map (
         # skos:prefLabel
         [ text:field "pref" ;
           text:predicate skos:prefLabel ;
           text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
         # skos:altLabel
         [ text:field "alt" ;
           text:predicate skos:altLabel ;
           text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
         # skos:hiddenLabel
         [ text:field "hidden" ;
           text:predicate skos:hiddenLabel ;
           text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
         ) .
{noformat}

Here is a minimal test file that I load into the default graph:

{noformat}
<http://example.org/guppy> <http://www.w3.org/2004/02/skos/core#prefLabel> 
"Guppy"@en-gb .
{noformat}

This is the query I'm using:

{noformat}
PREFIX text: <http://jena.apache.org/text#>
SELECT * {
  ?s text:query 'G*' .
}
{noformat}

It returns one row (?s=<http://example.org/guppy>) on Fuseki 3.9.0 but nothing 
with today's 3.10 snapshot.

If I change the 'G*' to lowercase 'g*' then I get the expected match also with 
the 3.10 snapshot. So the analyzer (which should lowercase everything and thus 
the case of the query string should be irrelevant) seems not to be applied for 
the query string.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to