Hi,

 

I think your use case is broken because of you don’t use 
https://issues.apache.org/jira/browse/LUCENE-5803

 

I see that you are using already PER_FIELD_REUSE_STRATEGY for the top-level 
Analyzer. But as caching still uses the reuse stategy of inner analyzers, the 
wrong one might be picked (I did not fully check your code, but I suspect 
something like this). To make such a thing like FileAnalyzer, use this 4.10 
introduced 
http://lucene.apache.org/core/6_4_1/core/org/apache/lucene/analysis/DelegatingAnalyzerWrapper.html
 as base class and don’t extend Analyzer directly. This helps to correctly 
delegate to the delegates based on field names, but *not* warpping inner 
components (something that just delegates per field).

 

The correct Analyzer to delegate to (per field name) is returned by 
implementing abstract “getWrappedAnalyzer(fieldname)”. FYI, this requires that 
all delegates are Analyzers, too, you cannot lazyly create 
TokenStreamComponents. Important: Create all wrapped analyzers early and not on 
deman, as this will also break! So Make FileAnalyzer do everything on 
construction and then create all delegates and just implement 
getWrappedAnalyzer(fieldname).

 

Lucene’s Analyzers should all be unmodifiable and only have final fields, so 
they should not have any state (like modifiable config). Everything else may 
cause problems. So create all delegates on construction and also pass all 
parameters on constrcution. A good way to do this is to use some “builder 
pattern”, like Lucene’s CustomAnalyzer.

 

The problem is that delegation also uses the reuse strategy of the inner 
analyzers and handling that in a wrong way may mix everything up. 

 

Uwe

 

-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: [email protected]

 

From: Ľuboš Koščo [mailto:[email protected]] 
Sent: Friday, February 17, 2017 9:27 AM
To: Michael McCandless <[email protected]>
Cc: Lucene/Solr dev <[email protected]>
Subject: Re: On LUCENE-5611 and 6.4.1

 

One more Q before I can work on tests

how does recent lucene pick appropriate analyzer for the doc? 

Were you doing some changes in that area since 4.7.1 ?

(if we decide the indexing chain didn't influence this and still uses analyzer 
properly picked)

(I checked changelogs and didn't find any suspicious change in that area ... )

 

thnx

L

 

 

On 11 February 2017 at 00:47, Michael McCandless <[email protected] 
<mailto:[email protected]> > wrote:

Could you make a small standalone test case showing what used to work
and what no longer works?

I don't think that issue was supposed to alter how IndexWriter
interacts with the analysis chain.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Feb 10, 2017 at 9:48 AM, Ľuboš Koščo <[email protected] 
<mailto:[email protected]> > wrote:
> Resp. how to make the double inherited analyzer (on the bottom of
> inheritance) be used again, instead of hidden by its father direct
> descendant of Analyzer?
> (father:
> https://github.com/OpenGrok/OpenGrok/blob/master/src/org/opensolaris/opengrok/analysis/FileAnalyzer.java
> child:
> https://github.com/OpenGrok/OpenGrok/blob/master/src/org/opensolaris/opengrok/analysis/java/JavaAnalyzer.java
> - looking at above it's even deeper inheritance, so Analyzer -> FileAnalyzer
> -> ... ->JavaAnalyzer as the last child)
>
> (funny enough the code on our side that creates docs didn't really change
> since 4.7.1 , but new lucene now picks FileAnalyzer over any other analyzer
> for createComponents anyways)
>
> tia
> L
>
> On 10 February 2017 at 13:41, Ľuboš Koščo <[email protected] 
> <mailto:[email protected]> > wrote:
>>
>> Hi guys, Mike
>>
>> is there any chance I can somehow get the indexing chain to behave similar
>> as before LUCENE-5611 in 6.4.1 ?
>>
>> We used to have analyzers that inherited multiple times from Analyzer
>> (e.g. second child and relaxed and overriden createComponents) and lucene
>> used to run them for appropriate docs properly
>> but after LUCENE-5611 I can see the chain changed and only the first child
>> is always taken into account, even though the document is handled by proper
>> analyzer ...
>> (basically between 4.7.1 and 6.4.1 something changed that made lucene just
>> ignore second child of analyzer and won't use it and always use first one
>> (and its father, the direct override of createComponents))
>> Some code pointers on what used to work and now isn't :
>> https://github.com/OpenGrok/OpenGrok/issues/1376
>> (and I tried to dig the changelogs and the only thing I found is really
>> around 5611, hence this silly Q)
>>
>> any clues how to get old behaviour back?
>>
>> thnx
>> L
>>
>

 

Reply via email to