contrib: keywordTokenStream

Wolfgang Hoschek Tue, 03 May 2005 13:26:43 -0700

Here's a convenience add-on method to MemoryIndex. If it turns out that this could be of wider use, it could be moved into the core analysis package. For the moment the MemoryIndex might be a better home. Opinions, anyone?

Wolfgang.

/** * Convenience method; Creates and returns a token stream that generates a * token for each keyword in the given collection, "as is", without any * transforming text analysis. The resulting token stream can be fed into * [EMAIL PROTECTED] #addField(String, TokenStream)}, perhaps wrapped into another * [EMAIL PROTECTED] org.apache.lucene.analysis.TokenFilter}, as desired. * * @param keywords * the keywords to generate tokens for * @return the corresponding token stream */ public TokenStream keywordTokenStream(final Collection keywords) { if (keywords == null) throw new IllegalArgumentException("keywords must not be null"); return new TokenStream() { Iterator iter = keywords.iterator(); int pos = 0; int start = 0; public Token next() { if (!iter.hasNext()) return null; Object obj = iter.next(); if (obj == null) throw new IllegalArgumentException("keyword must not be null"); String term = obj.toString(); Token token = new Token(term, start, start + term.length()); start += term.length() + 1; // separate words by 1 (blank) character pos++; return token; } }; }


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

contrib: keywordTokenStream

Reply via email to