Hi,
I think your use case is broken because of you don’t use https://issues.apache.org/jira/browse/LUCENE-5803 I see that you are using already PER_FIELD_REUSE_STRATEGY for the top-level Analyzer. But as caching still uses the reuse stategy of inner analyzers, the wrong one might be picked (I did not fully check your code, but I suspect something like this). To make such a thing like FileAnalyzer, use this 4.10 introduced http://lucene.apache.org/core/6_4_1/core/org/apache/lucene/analysis/DelegatingAnalyzerWrapper.html as base class and don’t extend Analyzer directly. This helps to correctly delegate to the delegates based on field names, but *not* warpping inner components (something that just delegates per field). The correct Analyzer to delegate to (per field name) is returned by implementing abstract “getWrappedAnalyzer(fieldname)”. FYI, this requires that all delegates are Analyzers, too, you cannot lazyly create TokenStreamComponents. Important: Create all wrapped analyzers early and not on deman, as this will also break! So Make FileAnalyzer do everything on construction and then create all delegates and just implement getWrappedAnalyzer(fieldname). Lucene’s Analyzers should all be unmodifiable and only have final fields, so they should not have any state (like modifiable config). Everything else may cause problems. So create all delegates on construction and also pass all parameters on constrcution. A good way to do this is to use some “builder pattern”, like Lucene’s CustomAnalyzer. The problem is that delegation also uses the reuse strategy of the inner analyzers and handling that in a wrong way may mix everything up. Uwe ----- Uwe Schindler Achterdiek 19, D-28357 Bremen <http://www.thetaphi.de/> http://www.thetaphi.de eMail: [email protected] From: Ľuboš Koščo [mailto:[email protected]] Sent: Friday, February 17, 2017 9:27 AM To: Michael McCandless <[email protected]> Cc: Lucene/Solr dev <[email protected]> Subject: Re: On LUCENE-5611 and 6.4.1 One more Q before I can work on tests how does recent lucene pick appropriate analyzer for the doc? Were you doing some changes in that area since 4.7.1 ? (if we decide the indexing chain didn't influence this and still uses analyzer properly picked) (I checked changelogs and didn't find any suspicious change in that area ... ) thnx L On 11 February 2017 at 00:47, Michael McCandless <[email protected] <mailto:[email protected]> > wrote: Could you make a small standalone test case showing what used to work and what no longer works? I don't think that issue was supposed to alter how IndexWriter interacts with the analysis chain. Mike McCandless http://blog.mikemccandless.com On Fri, Feb 10, 2017 at 9:48 AM, Ľuboš Koščo <[email protected] <mailto:[email protected]> > wrote: > Resp. how to make the double inherited analyzer (on the bottom of > inheritance) be used again, instead of hidden by its father direct > descendant of Analyzer? > (father: > https://github.com/OpenGrok/OpenGrok/blob/master/src/org/opensolaris/opengrok/analysis/FileAnalyzer.java > child: > https://github.com/OpenGrok/OpenGrok/blob/master/src/org/opensolaris/opengrok/analysis/java/JavaAnalyzer.java > - looking at above it's even deeper inheritance, so Analyzer -> FileAnalyzer > -> ... ->JavaAnalyzer as the last child) > > (funny enough the code on our side that creates docs didn't really change > since 4.7.1 , but new lucene now picks FileAnalyzer over any other analyzer > for createComponents anyways) > > tia > L > > On 10 February 2017 at 13:41, Ľuboš Koščo <[email protected] > <mailto:[email protected]> > wrote: >> >> Hi guys, Mike >> >> is there any chance I can somehow get the indexing chain to behave similar >> as before LUCENE-5611 in 6.4.1 ? >> >> We used to have analyzers that inherited multiple times from Analyzer >> (e.g. second child and relaxed and overriden createComponents) and lucene >> used to run them for appropriate docs properly >> but after LUCENE-5611 I can see the chain changed and only the first child >> is always taken into account, even though the document is handled by proper >> analyzer ... >> (basically between 4.7.1 and 6.4.1 something changed that made lucene just >> ignore second child of analyzer and won't use it and always use first one >> (and its father, the direct override of createComponents)) >> Some code pointers on what used to work and now isn't : >> https://github.com/OpenGrok/OpenGrok/issues/1376 >> (and I tried to dig the changelogs and the only thing I found is really >> around 5611, hence this silly Q) >> >> any clues how to get old behaviour back? >> >> thnx >> L >> >
