[
https://issues.apache.org/jira/browse/JENA-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721644#comment-16721644
]
Osma Suominen commented on JENA-1652:
-------------------------------------
I can think of many ways of fixing the original problem I found with Skosmos
queries not working:
# Work around the problem on the Skosmos side, i.e. lowercase all query strings
that are passed to jena-text. This is probably not a very good solution as
other jena-text users may still be affected by the change.
# Change jena-text to lowercase the query string before it is passed to
QueryParser.parse().
# Figure out some other way to force Lucene to normalize the query string. I
tried to look for such a facility but couldn't find one. Lucene 6.4 had a
[lowerCaseExpandedTerms|https://github.com/apache/lucene-solr/blob/branch_6_4/lucene/queryparser/src/java/org/apache/lucene/queryparser/classic/QueryParserBase.java#L60]
setting that controlled this and defaulted to true. This setting no longer
exists in 7.4+.
# Report this as a Lucene bug.
# Some combination of the above.
Any ideas?
> jena-text analyzer regression
> -----------------------------
>
> Key: JENA-1652
> URL: https://issues.apache.org/jira/browse/JENA-1652
> Project: Apache Jena
> Issue Type: Bug
> Components: Text
> Affects Versions: Jena 3.10.0
> Environment: Ubuntu 16.04
> java version "1.8.0_191"
> Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
> Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
> Reporter: Osma Suominen
> Assignee: Osma Suominen
> Priority: Major
> Fix For: Jena 3.10.0
>
>
> I noticed that Skosmos unit tests are failing when run with Fuseki 3.10
> snapshots:
> https://github.com/NatLibFi/Skosmos/issues/828
> Digging a bit deeper, it seems that jena-text is no longer applying the
> analyzer on query strings as it used to in 3.9.0. The most likely reason for
> this change seems to be the Lucene upgrade (JENA-1621) which may have
> affected how analyzers are applied.
> Here is the text analyzer configuration I'm using:
> {noformat}
> <#indexLucene> a text:TextIndexLucene ;
> ##text:directory <file:/tmp/lucene> ;
> text:directory "mem" ;
> text:entityMap <#entMap> ;
> text:storeValues true ;
> .
> <#entMap> a text:EntityMap ;
> text:entityField "uri" ;
> text:graphField "graph" ; ## enable graph-specific indexing
> text:defaultField "pref" ; ## Must be defined in the text:map
> text:uidField "uid" ;
> text:langField "lang" ;
> text:map (
> # skos:prefLabel
> [ text:field "pref" ;
> text:predicate skos:prefLabel ;
> text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
> # skos:altLabel
> [ text:field "alt" ;
> text:predicate skos:altLabel ;
> text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
> # skos:hiddenLabel
> [ text:field "hidden" ;
> text:predicate skos:hiddenLabel ;
> text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
> ) .
> {noformat}
> Here is a minimal test file that I load into the default graph:
> {noformat}
> <http://example.org/guppy> <http://www.w3.org/2004/02/skos/core#prefLabel>
> "Guppy"@en-gb .
> {noformat}
> This is the query I'm using:
> {noformat}
> PREFIX text: <http://jena.apache.org/text#>
> SELECT * {
> ?s text:query 'G*' .
> }
> {noformat}
> It returns one row (?s=<http://example.org/guppy>) on Fuseki 3.9.0 but
> nothing with today's 3.10 snapshot.
> If I change the 'G*' to lowercase 'g*' then I get the expected match also
> with the 3.10 snapshot. So the analyzer (which should lowercase everything
> and thus the case of the query string should be irrelevant) seems not to be
> applied for the query string.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)