Hi ayyanar,

On 01/05/2009 at 12:23 PM, ayyanar wrote:
> I need a tokenizer that tokenizes a keyword as follows: Consider an
> example "President day" - this should be tokenized as "President day",
> "President", "Day" This seems to be a functionality of a keyword
> tokenizer and whitespace tokenizer Do we have any tokenizer that does
> this job or we need to write a custom one?

A ShingleFilter 
<http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/shingle/ShingleFilter.html>
 over a whitespace tokenizer should do the trick.  By default, unigrams 
(individual terms) are output in addition to shingles (token n-grams).

Steve

Reply via email to