[jira] [Commented] (JENA-1250) Upgrade text search to latest Lucene

Osma Suominen (JIRA) Sun, 30 Oct 2016 14:50:24 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620630#comment-15620630
 ]


Osma Suominen commented on JENA-1250:
-------------------------------------

I took a look at the test failure with 6.2.1. It seems that 
AnalyzingQueryParser functionality has changed in Lucene 6.2.0/6.2.1 in a way 
that breaks some of the jena-text custom analyzers. This was done in 
LUCENE-7355 and this commit: 
https://github.com/apache/lucene-solr/commit/7c2e7a0fb80a5bf733cf710aee6cbf01d02629eb

We can already upgrade to 6.1.0, you just need to fix the VER setting in 
TextIndexLucene to a version that it knows about, e.g. 6_1_0 works fine.

I think that our code could be made compatible with Lucene 6.2.1 by adding 
normalize methods to our custom analyzers, similar to how all the analyzers in 
Lucene were patched in LUCENE-7355.

Regarding your comment about encoding language in the field name: this isn't 
really about storing the language in the field name (the lang field is still 
used for that), but about keeping the different language values - that have 
been analyzed differently - in separate fields so it is impossible to mix them 
up by mistake.

Personally I'm not super happy about the way I had to [patch 
in|https://github.com/jmvanel/jena/commit/072afffbf24d68575e1b9cf886a5998739cb5ca9#diff-48122cd8c64b1e120b37581f50fc55ceR330]
 the language tag to the field name in TextIndexLucene.parseQuery but currently 
query strings with field names are generated outside TextIndexLucene (in 
TextQueryPF) so it's almost too late to switch the field name from within 
TextIndexLucene. The replaceFirst hack works but there may be edge cases that 
are broken; in any case it's very hackish. I may try to come up with a 
different solution but it would involve shifting responsibilities somehow 
between those two classes.

> Upgrade text search to latest Lucene
> ------------------------------------
>
>                 Key: JENA-1250
>                 URL: https://issues.apache.org/jira/browse/JENA-1250
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: Jena
>            Reporter: Jean-Marc Vanel
>
> We are currently at Lucene 4.9.1 ,
> which is quite outdated compared to latest Lucene, which is 6.2.1 .
> Note that there is project to add a simple completion feature in addition to 
> existing simple search.
> But it would be better to do that on an updated Lucene dependency .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (JENA-1250) Upgrade text search to latest Lucene

Reply via email to