RE: Lucene 4.0 PerFieldAnalyzerWrapper question

Mike O'Leary Tue, 25 Sep 2012 18:08:36 -0700

Hi Chris,
So if I change my analyzer to inherit from AnalyzerWrapper, I need to define a 
getWrappedAnalyzer function and a wrapComponents function. I think 
getWrappedAnalyzer is straightforward, but I don't understand who is calling 
wrapComponents and for what purpose, so I don't know how to define it. This is 
my modified analyzer code with ??? in the places I don't know how to define.
Thanks,
Mike


public class MyPerFieldAnalyzer extends AnalyzerWrapper {
  Map<String, Analyzer> _analyzerMap = new HashMap<String,  Analyzer>();
  Analyzer _defaultAnalyzer;

  public MyPerFieldAnalyzer() {
    _analyzerMap.put("IDNumber", new KeywordAnalyzer());
    ...
    ...

    _defaultAnalyzer = new CustomAnalyzer();
  }

  @Override
  protected Analyzer getWrappedAnalyzer(String fieldName) {
    Analyzer analyzer;

    if (analyzerMap.containsKey(fieldName) {
      analyzer = analyzerMap.get(fieldName);
    } else {
      analyzer = defaultAnalyzer;
    }
  }

  @Override
  public TokenStreamComponents wrapComponents(String fieldname,  
TokenStreamComponents components) {
    Tokenizer tokenizer = ???;
    TokenStream tokenStream = ???;
    return new TokenStreamComponents(tokenizer, tokenStream);
  }
}

-----Original Message-----
From: Chris Male [mailto:[email protected]] 
Sent: Tuesday, September 25, 2012 5:34 PM
To: [email protected]
Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question

Ah I see.

The problem is that we don't really encourage wrapping of Analyzers.  Your 
Analyzer wraps a PerFieldAnalyzerWrapper consequently it needs to extend 
AnalyzerWrapper, not Analyzer.  AnalyzerWrapper handles the createComponents 
call and just requires you to give it the Analyzer(s) you've wrapped through 
getWrappedAnalyzer.

You can avoid all this entirely of course by not extending Analyzer but instead 
just instantiating a PerFieldAnalyerWrapper instance directly instead of your 
MyPerFieldAnalyzer.

On Wed, Sep 26, 2012 at 12:25 PM, Mike O'Leary <[email protected]> wrote:

> Hi Chris,
> In a nutshell, my question is, what should I put in place of ??? to 
> make this into a Lucene 4.0 analyzer?
>
> public class MyPerFieldAnalyzer extends Analyzer {
>   PerFieldAnalyzerWrapper _analyzer;
>
>   public MyPerFieldAnalyzer() {
>     Map<String, Analyzer> analyzerMap = new HashMap<String,  
> Analyzer>();
>
>     analyzerMap.put("IDNumber", new KeywordAnalyzer());
>     ...
>     ...
>
>     _analyzer = new PerFieldAnalyzerWrapper(new CustomAnalyzer(),  
> analyzerMap);
>   }
>
>   @Override
>   public TokenStreamComponents createComponents(String fieldname, 
> Reader
> reader) {
>     Tokenizer source = ???;
>     TokenStream stream = _analyzer.tokenStream(fieldname, reader);
>     return new TokenStreamComponents(source, stream);
>   }
> }
>
> I must be missing something obvious. Can you tell me what it is?
> Thanks,
> Mike
>
> -----Original Message-----
> From: Chris Male [mailto:[email protected]]
> Sent: Tuesday, September 25, 2012 5:18 PM
> To: [email protected]
> Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question
>
> Hi Mike,
>
> I don't really understand what problem you're having.
>
> PerFieldAnalyzerWrapper, like all AnalyzerWrappers, uses 
> Analyzer.PerFieldReuseStrategy which means it caches the 
> TokenStreamComponents per field.  The TokenStreamComponents cached are 
> created by by retrieving the wrapped Analyzer through
> AnalyzerWrapper.getWrappedAnalyzer(Field) and calling createComponents.
>  In PerFieldAnalyzerWrapper, getWrappedAnalyzer pulls the Analyzer 
> from the Map you provide.
>
> Consequently to use your custom Analyzers and KeywordAnalyzer, all you 
> need to do is define your custom Analyzer using the new Analyzer API 
> (that is using TokenStreamComponents), create your Map from that 
> Analyzer and KeywordAnalyzer and pass it into PerFieldAnalyzerWrapper.  
> This seems to be what you're doing in your code sample.
>
> Are you able to expand on the problem you're encountering?
>
> On Wed, Sep 26, 2012 at 11:57 AM, Mike O'Leary <[email protected]> wrote:
>
> > I am updating an analyzer that uses a particular configuration of 
> > the PerFieldAnalyzerWrapper to work with Lucene 4.0. A few of the 
> > fields use a custom analyzer and StandardTokenizer and the other 
> > fields use the KeywordAnalyzer and KeywordTokenizer. The older 
> > version of the analyzer looks like this:
> >
> > public class MyPerFieldAnalyzer extends Analyzer {
> >   PerFieldAnalyzerWrapper _analyzer;
> >
> >   public MyPerFieldAnalyzer() {
> >     Map<String, Analyzer> analyzerMap = new HashMap<String,
> > Analyzer>();
> >
> >     analyzerMap.put("IDNumber", new KeywordAnalyzer());
> >     ...
> >     ...
> >
> >     _analyzer = new PerFieldAnalyzerWrapper(new CustomAnalyzer(), 
> > analyzerMap);
> >   }
> >
> >   @Override
> >   public TokenStream tokenStream(String fieldname, Reader reader) {
> >     TokenStream stream = _analyzer.tokenStream(fieldname, reader);
> >     return stream;
> >   }
> > }
> >
> > In older versions of Lucene it is necessary to define a tokenStream 
> > function, but in 4.0 it is not (in fact, TokenStream is declared 
> > final, so you can't). Instead, it is necessary to define a 
> > createComponents function that takes the same arguments as the 
> > tokenStream function and returns a TokenStreamComponents object. The 
> > TokenStreamComponents constructor has a Tokenizer argument and a 
> > TokenStream argument. I assume I can just use the same code to 
> > provide the TokenStream object as was used in the older analyzer's 
> > tokenStream function, but I don't see how to provide a Tokenizer 
> > object, unless it is by creating a separate map of field names to 
> > Tokenizers that works the same way the analyzer map does. Is that 
> > the best way to do this, or is there a better way? For example, 
> > would it be better to inherit from AnalyzerWrapper instead of from 
> > Analyzer? In that case I would need to define getWrappedAnalyzer and 
> > wrappedComponents functions. I think in that case I would still need 
> > to put the same kind of logic in the wrapComponents function that 
> > specifies which tokenizer to use with which field, though. It looks 
> > like the PerFieldAnalyzerWrapper itself assumes that the same 
> > tokenizer will be used with all fields, as its wrapComponents 
> > function ignores the fieldname parameter. I would appreciate any 
> > help in finding out the best way to update this analyzer
> and to write the required function(s).
>
> Thanks,
> > Mike
> >
>
>
>
> --
> Chris Male | Open Source Search Developer | elasticsearch | www.e< 
> http://www.dutchworks.nl> lasticsearch.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


--
Chris Male | Open Source Search Developer | elasticsearch | 
www.e<http://www.dutchworks.nl> lasticsearch.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Lucene 4.0 PerFieldAnalyzerWrapper question

Reply via email to