thanks !!
On Mon, Jul 15, 2013 at 1:31 PM, Ivan Krišto <ivan.kri...@gmail.com> wrote: > On 07/15/2013 07:50 PM, Malgorzata Urbanska wrote: >> Hi, >> >> I've been trying to figure out how to use ngrams in Lucene 4.3.0 >> I found some examples for earlier version but I'm still confused. >> How I understand it, I should: >> 1. create a new analyzer which uses ngrams >> 2. apply it to my indexer >> 3. search using the same analyzer >> >> I found in a documentation: NGramTokenFilter and NGramTokenizer, but I >> do not understand what is the difference between them. > This should be helpful: > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Tokenizers > > Here is example of n-gram analyzer: > > public class NGramAnalyzer extends Analyzer { > @Override > protected TokenStreamComponents createComponents(String fieldName, > Reader reader) { > > Tokenizer src = new NGramTokenizer(reader, 3, 3); > > TokenStream tok = new StandardFilter(Version.LUCENE_43, src); > tok = new LowerCaseFilter(Version.LUCENE_43, tok); > > return new TokenStreamComponents(src, tok) { > @Override > protected void setReader(final Reader reader) throws > IOException { > super.setReader(reader); > } > }; > } > } > > If, for example, you want to remove stop words from document before > breaking it into n-grams, than you would need: > reader(document) -> SomeTokenizer -> StopFilter -> NGramTokenFilter > > > Regards, > Ivan Krišto > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Malgorzata Urbanska (Gosia) Graduate Assistant Colorado State University --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org