On 6/12/2013 7:02 PM, Steven Schlansker wrote:
On Jun 12, 2013, at 3:44 PM, Michael Sokolov <msoko...@safaribooksonline.com> 
wrote:

You may not have noticed that CharFilter extends Reader.  The expected pattern 
here is that you chain instances together -- your CharFilter should act as 
*input* to the Analyzer, I think.  Don't think in terms of extending these 
analysis classes (except the base ones designed for it): compose them so that 
each consumes the one before it

Hi Mike,

Hm, that may work out.  I am a little surprised because I thought the intention 
is that you set the Analyzer up as part of the configuration, and when you add 
documents, the analyzer takes care of all text processing.  In particular this 
means that now I have to ensure that the same transformation is done at query 
time, and I thought the analyzer abstraction was supposed to avoid this.

But if this is how it should be done, it could work.  Thanks for the pointer.

Steven


Um I'm sorry I was in a hurry and forgot to think... I went back and looked at my code and found the pattern was different from what I was thinking. I have:

public final class DefaultAnalyzer extends Analyzer {

    @Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) { Tokenizer tokenizer = new StandardTokenizer(IndexConfiguration.LUCENE_VERSION, reader); TokenStream tokenStream = new LowerCaseFilter(IndexConfiguration.LUCENE_VERSION, tokenizer);
        // ASCIIFoldingFilter
        // Stemming
        return new TokenStreamComponents(tokenizer, tokenStream);
    }

}

You were exactly right that subclassing Analyzer and overriding the initReader is the way to go. The composition I was talking about can happen among filters. I guess you have to duplicate the internals of StandardAnalyzer, but I don't think there's all that much in there?

I used AnalyzerWrapper for something -- um switching between multiple analyzers based on the input. But it doesn't allow you to do anything with the internals of the analyzer(s) it wraps.

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to