On Friday, September 19, 2003, at 11:15 AM, hui wrote:1. Move the Analyzer down to field level from document level so some fields
could be applied a specail analyzer.Other fields still use the default
analyzer from the document level.
For example, I do not need to index the number for the "content" field. It
helps me reduce the index size a lot when I have some excel files. But I
always need the "created_date" to be indexed though it is a number field.
I know there are some workarounds put in the group, but I think it should be
a good feature to have.
The "workaround" is to write a custom analyzer and and have it do the desired thing per-field.
Hmmm.... just thinking out loud here without knowing if this is possible, but could a generic "wrapper" Analyzer be written that allows other analyzers to be used under the covers based on a field name/analyzer mapping? If so, that would be quite cool and save folks from having to write custom analyzers as much to handle this pretty typical use-case. I'll look into this more in the very near future personally, but feel free to have a look at this yourself and see what you can come up with.
What about something like this?
public class PerFieldWrapperAnalyzer extends Analyzer { private Analyzer defaultAnalyzer; private Map analyzerMap = new HashMap();
public PerFieldWrapperAnalyzer(Analyzer defaultAnalyzer) { this.defaultAnalyzer = defaultAnalyzer; }
public void addAnalyzer(String fieldName, Analyzer analyzer) { analyzerMap.put(fieldName, analyzer); }
public TokenStream tokenStream(String fieldName, Reader reader) { Analyzer analyzer = (Analyzer) analyzerMap.get(fieldName); if (analyzer == null) { analyzer = defaultAnalyzer; }
return analyzer.tokenStream(fieldName, reader); } }
This would allow you to construct a single analyzer out of others, on a per-field basis, including a default one for any fields that do not have a special one. Whether the constructor should take the map or the addAnalyzer method is implemented is debatable, but I prefer the addAnalyzer way. Maybe addAnalyzer could return 'this' so you could chain: new PerFieldWrapperAnalyzer(new StandardAnalyzer).addAnalyzer("field1", new WhitespaceAnalyzer()).addAnalyzer(.....). And I'm more inclined to call this thing PerFieldAnalyzerWrapper instead. Any naming suggestions?
This simple little class would seem to be the answer to a very common question asked.
Thoughts? Should this be made part of the core?
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]