I am updating an analyzer that uses a particular configuration of the 
PerFieldAnalyzerWrapper to work with Lucene 4.0. A few of the fields use a 
custom analyzer and StandardTokenizer and the other fields use the 
KeywordAnalyzer and KeywordTokenizer. The older version of the analyzer looks 
like this:

public class MyPerFieldAnalyzer extends Analyzer {
  PerFieldAnalyzerWrapper _analyzer;

  public MyPerFieldAnalyzer() {
    Map<String, Analyzer> analyzerMap = new HashMap<String, Analyzer>();

    analyzerMap.put("IDNumber", new KeywordAnalyzer());
    ...
    ...

    _analyzer = new PerFieldAnalyzerWrapper(new CustomAnalyzer(), analyzerMap);
  }

  @Override
  public TokenStream tokenStream(String fieldname, Reader reader) {
    TokenStream stream = _analyzer.tokenStream(fieldname, reader);
    return stream;
  }
}

In older versions of Lucene it is necessary to define a tokenStream function, 
but in 4.0 it is not (in fact, TokenStream is declared final, so you can't). 
Instead, it is necessary to define a createComponents function that takes the 
same arguments as the tokenStream function and returns a TokenStreamComponents 
object. The TokenStreamComponents constructor has a Tokenizer argument and a 
TokenStream argument. I assume I can just use the same code to provide the 
TokenStream object as was used in the older analyzer's tokenStream function, 
but I don't see how to provide a Tokenizer object, unless it is by creating a 
separate map of field names to Tokenizers that works the same way the analyzer 
map does. Is that the best way to do this, or is there a better way? For 
example, would it be better to inherit from AnalyzerWrapper instead of from 
Analyzer? In that case I would need to define getWrappedAnalyzer and 
wrappedComponents functions. I think in that case I would still need to put the 
same kind of logic in the wrapComponents function that specifies which 
tokenizer to use with which field, though. It looks like the 
PerFieldAnalyzerWrapper itself assumes that the same tokenizer will be used 
with all fields, as its wrapComponents function ignores the fieldname 
parameter. I would appreciate any help in finding out the best way to update 
this analyzer and to write the required function(s).
Thanks,
Mike

Reply via email to