On Friday, September 19, 2003, at 07:45 PM, Erik Hatcher wrote:
On Friday, September 19, 2003, at 11:15 AM, hui wrote:
1. Move the Analyzer down to field level from document level so some fields
could be applied a specail analyzer.Other fields still use the default
analyzer from the document level.
For example, I do not need to index the number for the "content" field. It
helps me reduce the index size a lot when I have some excel files. But I
always need the "created_date" to be indexed though it is a number field.


I know there are some workarounds put in the group, but I think it should be
a good feature to have.

The "workaround" is to write a custom analyzer and and have it do the desired thing per-field.


Hmmm.... just thinking out loud here without knowing if this is possible, but could a generic "wrapper" Analyzer be written that allows other analyzers to be used under the covers based on a field name/analyzer mapping? If so, that would be quite cool and save folks from having to write custom analyzers as much to handle this pretty typical use-case. I'll look into this more in the very near future personally, but feel free to have a look at this yourself and see what you can come up with.

What about something like this?


public class PerFieldWrapperAnalyzer extends Analyzer {
  private Analyzer defaultAnalyzer;
  private Map analyzerMap = new HashMap();


public PerFieldWrapperAnalyzer(Analyzer defaultAnalyzer) { this.defaultAnalyzer = defaultAnalyzer; }

  public void addAnalyzer(String fieldName, Analyzer analyzer) {
    analyzerMap.put(fieldName, analyzer);
  }

  public TokenStream tokenStream(String fieldName, Reader reader) {
    Analyzer analyzer = (Analyzer) analyzerMap.get(fieldName);
    if (analyzer == null) {
      analyzer = defaultAnalyzer;
    }

    return analyzer.tokenStream(fieldName, reader);
  }
}

This would allow you to construct a single analyzer out of others, on a per-field basis, including a default one for any fields that do not have a special one. Whether the constructor should take the map or the addAnalyzer method is implemented is debatable, but I prefer the addAnalyzer way. Maybe addAnalyzer could return 'this' so you could chain: new PerFieldWrapperAnalyzer(new StandardAnalyzer).addAnalyzer("field1", new WhitespaceAnalyzer()).addAnalyzer(.....). And I'm more inclined to call this thing PerFieldAnalyzerWrapper instead. Any naming suggestions?

This simple little class would seem to be the answer to a very common question asked.

Thoughts? Should this be made part of the core?

Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to