Re: contrib: keywordTokenStream

Wolfgang Hoschek Tue, 03 May 2005 22:14:45 -0700

On May 3, 2005, at 5:26 PM, Erik Hatcher wrote:

Wolfgang,
I've now added this.


Thanks :-)

I'm not seeing how this could be generally useful. I'm curious how you are using it and why it is better suited for what you're doing than any other analyzer.

"keyword tokenizer" is a bit overloaded terminology-wise, though - look in the contrib/analyzers/src/java area to see what I mean.

Erik

The difference between this and the KeywordTokenizer from the contrib/analyzer is that it

- can operate on multiple keywords rather than just a single one. So it's slighly more general. - Takes a collection (typically of String values) as a input rather than a Reader. I can see the java.io.Reader scalability rationale used throughout the analysis APIs, but for many use cases (including my own) Strings are a lot handier (and more efficient to deal with) - the string values are small anyway.

So it's a convenient way to add terms (keywords if you like) that have been parsed/massaged into string(s) by some existing external means (e.g. grouped regex scanning of legacy formatted text files into various fields, etc) into an index "as is", without any further transforming analysis. Most folks could write such a (non-essential) utility themselves but it's handy in a similar way that you have the Field.Keyword convenience infrastructure...

"keyword tokenizer" is a bit overloaded terminology-wise, though


If you come up with a better name feel free to rename it.

Wolfgang.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: contrib: keywordTokenStream

Reply via email to