Good work, Erik.
Hui
----- Original Message -----
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Saturday, September 20, 2003 4:13 AM
Subject: per-field Analyzer (was Re: some requests)
> On Friday, September 19, 2003, at 07:45 PM, Erik Hatcher wrote:
> > On Friday, September 19, 2003, at 11:15 AM, hui wrote:
> >> 1. Move the Analyzer down to field level from document level so some
> >> fields
> >> could be applied a specail analyzer.Other fields still use the default
> >> analyzer from the document level.
> >> For example, I do not need to index the number for the "content"
> >> field. It
> >> helps me reduce the index size a lot when I have some excel files.
> >> But I
> >> always need the "created_date" to be indexed though it is a number
> >> field.
> >>
> >> I know there are some workarounds put in the group, but I think it
> >> should be
> >> a good feature to have.
> >
> > The "workaround" is to write a custom analyzer and and have it do the
> > desired thing per-field.
> >
> > Hmmm.... just thinking out loud here without knowing if this is
> > possible, but could a generic "wrapper" Analyzer be written that
> > allows other analyzers to be used under the covers based on a field
> > name/analyzer mapping? If so, that would be quite cool and save
> > folks from having to write custom analyzers as much to handle this
> > pretty typical use-case. I'll look into this more in the very near
> > future personally, but feel free to have a look at this yourself and
> > see what you can come up with.
>
> What about something like this?
>
> public class PerFieldWrapperAnalyzer extends Analyzer {
> private Analyzer defaultAnalyzer;
> private Map analyzerMap = new HashMap();
>
>
> public PerFieldWrapperAnalyzer(Analyzer defaultAnalyzer) {
> this.defaultAnalyzer = defaultAnalyzer;
> }
>
> public void addAnalyzer(String fieldName, Analyzer analyzer) {
> analyzerMap.put(fieldName, analyzer);
> }
>
> public TokenStream tokenStream(String fieldName, Reader reader) {
> Analyzer analyzer = (Analyzer) analyzerMap.get(fieldName);
> if (analyzer == null) {
> analyzer = defaultAnalyzer;
> }
>
> return analyzer.tokenStream(fieldName, reader);
> }
> }
>
> This would allow you to construct a single analyzer out of others, on a
> per-field basis, including a default one for any fields that do not
> have a special one. Whether the constructor should take the map or the
> addAnalyzer method is implemented is debatable, but I prefer the
> addAnalyzer way. Maybe addAnalyzer could return 'this' so you could
> chain: new PerFieldWrapperAnalyzer(new
> StandardAnalyzer).addAnalyzer("field1", new
> WhitespaceAnalyzer()).addAnalyzer(.....). And I'm more inclined to
> call this thing PerFieldAnalyzerWrapper instead. Any naming
> suggestions?
>
> This simple little class would seem to be the answer to a very common
> question asked.
>
> Thoughts? Should this be made part of the core?
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]