Re: ngrams in Lucene 4.3.0

Malgorzata Urbanska Mon, 15 Jul 2013 12:47:03 -0700

thanks !!



On Mon, Jul 15, 2013 at 1:31 PM, Ivan Krišto <[email protected]> wrote:
> On 07/15/2013 07:50 PM, Malgorzata Urbanska wrote:
>> Hi,
>>
>> I've been trying  to figure out how to use ngrams in Lucene 4.3.0
>> I found some examples for earlier version but I'm still confused.
>> How I understand it, I should:
>> 1. create a new analyzer which uses ngrams
>> 2. apply it to my indexer
>> 3. search using the same analyzer
>>
>> I found in a documentation: NGramTokenFilter and NGramTokenizer, but I
>> do not understand what is the difference between them.
> This should be helpful:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Tokenizers
>
> Here is example of n-gram analyzer:
>
> public class NGramAnalyzer extends Analyzer {
>     @Override
>     protected TokenStreamComponents createComponents(String fieldName,
>             Reader reader) {
>
>         Tokenizer src = new NGramTokenizer(reader, 3, 3);
>
>         TokenStream tok = new StandardFilter(Version.LUCENE_43, src);
>         tok = new LowerCaseFilter(Version.LUCENE_43, tok);
>
>         return new TokenStreamComponents(src, tok) {
>             @Override
>             protected void setReader(final Reader reader) throws
> IOException {
>                 super.setReader(reader);
>             }
>         };
>     }
> }
>
> If, for example, you want to remove stop words from document before
> breaking it into n-grams, than you would need:
> reader(document) -> SomeTokenizer -> StopFilter -> NGramTokenFilter
>
>
>   Regards,
>     Ivan Krišto
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>



-- 
Malgorzata Urbanska (Gosia)
Graduate Assistant
Colorado State University

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: ngrams in Lucene 4.3.0

Reply via email to