Wolfgang,
I've now added this. I'm not seeing how this could be generally useful. I'm curious how you are using it and why it is better suited for what you're doing than any other analyzer.
"keyword tokenizer" is a bit overloaded terminology-wise, though - look in the contrib/analyzers/src/java area to see what I mean.
Erik
On May 3, 2005, at 4:26 PM, Wolfgang Hoschek wrote:
Here's a convenience add-on method to MemoryIndex. If it turns out that this could be of wider use, it could be moved into the core analysis package. For the moment the MemoryIndex might be a better home. Opinions, anyone?
Wolfgang.
/**
* Convenience method; Creates and returns a token stream that generates a
* token for each keyword in the given collection, "as is", without any
* transforming text analysis. The resulting token stream can be fed into
* [EMAIL PROTECTED] #addField(String, TokenStream)}, perhaps wrapped into another
* [EMAIL PROTECTED] org.apache.lucene.analysis.TokenFilter}, as desired.
*
* @param keywords
* the keywords to generate tokens for
* @return the corresponding token stream
*/
public TokenStream keywordTokenStream(final Collection keywords) {
if (keywords == null)
throw new IllegalArgumentException("keywords must not be null");
return new TokenStream() { Iterator iter = keywords.iterator(); int pos = 0; int start = 0; public Token next() { if (!iter.hasNext()) return null;
Object obj = iter.next();
if (obj == null)
throw new IllegalArgumentException("keyword must not be null");
String term = obj.toString();
Token token = new Token(term, start, start + term.length());
start += term.length() + 1; // separate words by 1 (blank) character
pos++;
return token;
}
};
}
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]