[jira] [Commented] (JENA-1652) jena-text analyzer regression

Osma Suominen (JIRA) Mon, 17 Dec 2018 00:09:12 -0800


    [ 
https://issues.apache.org/jira/browse/JENA-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722763#comment-16722763
 ]


Osma Suominen commented on JENA-1652:
-------------------------------------

The Skosmos build now passes: 
https://travis-ci.org/NatLibFi/Skosmos/jobs/466985759

I checked the current analyzers under org.apache.jena.text.analyzer:

* ConfigurableAnalyer has a normalize() method
* IndexingMultilingualAnalyzer doesn't have a normalize() method
* LowerCaseKeywordAnalyzer now has a normalize() method
* MultilingualAnalyzer doesn't have a normalize() method
* QueryMultilingualAnalyzer doesn't have a normalize() method

So I wonder whether the *MultilingualAnalyzers also need this kind of fix? I'm 
not using them myself, so I'm unsure what the needs are here. But it looks as 
if all of these inherit from AnalyzerWrapper which delegates the normalize 
operation to the wrapped analyzers: 
https://github.com/apache/lucene-solr/blob/c07df196664b84cd2d58ce1ba9040a6b06e0a3c5/lucene/core/src/java/org/apache/lucene/analysis/AnalyzerWrapper.java#L141

So probably everything is OK here.

> jena-text analyzer regression
> -----------------------------
>
>                 Key: JENA-1652
>                 URL: https://issues.apache.org/jira/browse/JENA-1652
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Text
>    Affects Versions: Jena 3.10.0
>         Environment: Ubuntu 16.04
> java version "1.8.0_191"
> Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
> Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
>            Reporter: Osma Suominen
>            Assignee: Code Ferret
>            Priority: Major
>             Fix For: Jena 3.10.0
>
>
> I noticed that Skosmos unit tests are failing when run with Fuseki 3.10 
> snapshots:
> https://github.com/NatLibFi/Skosmos/issues/828
> Digging a bit deeper, it seems that jena-text is no longer applying the 
> analyzer on query strings as it used to in 3.9.0. The most likely reason for 
> this change seems to be the Lucene upgrade (JENA-1621) which may have 
> affected how analyzers are applied.
> Here is the text analyzer configuration I'm using:
> {noformat}
> <#indexLucene> a text:TextIndexLucene ;
>     ##text:directory <file:/tmp/lucene> ;
>     text:directory "mem" ;
>     text:entityMap <#entMap> ;
>     text:storeValues true ;
>     .
> <#entMap> a text:EntityMap ;
>     text:entityField      "uri" ;
>     text:graphField       "graph" ; ## enable graph-specific indexing
>     text:defaultField     "pref" ; ## Must be defined in the text:map
>     text:uidField         "uid" ;
>     text:langField        "lang" ;
>     text:map (
>          # skos:prefLabel
>          [ text:field "pref" ;
>            text:predicate skos:prefLabel ;
>            text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
>          # skos:altLabel
>          [ text:field "alt" ;
>            text:predicate skos:altLabel ;
>            text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
>          # skos:hiddenLabel
>          [ text:field "hidden" ;
>            text:predicate skos:hiddenLabel ;
>            text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
>          ) .
> {noformat}
> Here is a minimal test file that I load into the default graph:
> {noformat}
> <http://example.org/guppy> <http://www.w3.org/2004/02/skos/core#prefLabel> 
> "Guppy"@en-gb .
> {noformat}
> This is the query I'm using:
> {noformat}
> PREFIX text: <http://jena.apache.org/text#>
> SELECT * {
>   ?s text:query 'G*' .
> }
> {noformat}
> It returns one row (?s=<http://example.org/guppy>) on Fuseki 3.9.0 but 
> nothing with today's 3.10 snapshot.
> If I change the 'G*' to lowercase 'g*' then I get the expected match also 
> with the 3.10 snapshot. So the analyzer (which should lowercase everything 
> and thus the case of the query string should be irrelevant) seems not to be 
> applied for the query string.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (JENA-1652) jena-text analyzer regression

Reply via email to