Re: ngrams in Lucene 4.3.0

Malgorzata Urbanska Tue, 16 Jul 2013 14:01:31 -0700

Ok, I solved it

 I figured out; instead of NGramQuery in IndexSearcher I was using String :)
gosia


On Tue, Jul 16, 2013 at 12:28 PM, Malgorzata Urbanska
<[email protected]> wrote:
> Hi,
>
> I built Indexer with NGramAnalizer which uses ShingleFilter
>
> Next I built Searcher with NGramQuery which uses BooleanQuery
>
> String termToken = charTermAttribute.toString();
>              Term t = new Term("content",termToken);
>              add(new TermQuery(t),Occur.SHOULD);
>
> it looks like everything works perfectly however my searcher do not
> find any "hits"
>
> I suspect my indexer code, so I tried to check index. But  Luke does
> not work with Lucene 4.3.0 :(
>
> Could someone give me hint what is happening?
> Thanks,
> gosia
>
> On Mon, Jul 15, 2013 at 1:45 PM, Malgorzata Urbanska
> <[email protected]> wrote:
>> thanks !!
>>
>>
>>
>> On Mon, Jul 15, 2013 at 1:31 PM, Ivan Krišto <[email protected]> wrote:
>>> On 07/15/2013 07:50 PM, Malgorzata Urbanska wrote:
>>>> Hi,
>>>>
>>>> I've been trying  to figure out how to use ngrams in Lucene 4.3.0
>>>> I found some examples for earlier version but I'm still confused.
>>>> How I understand it, I should:
>>>> 1. create a new analyzer which uses ngrams
>>>> 2. apply it to my indexer
>>>> 3. search using the same analyzer
>>>>
>>>> I found in a documentation: NGramTokenFilter and NGramTokenizer, but I
>>>> do not understand what is the difference between them.
>>> This should be helpful:
>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Tokenizers
>>>
>>> Here is example of n-gram analyzer:
>>>
>>> public class NGramAnalyzer extends Analyzer {
>>>     @Override
>>>     protected TokenStreamComponents createComponents(String fieldName,
>>>             Reader reader) {
>>>
>>>         Tokenizer src = new NGramTokenizer(reader, 3, 3);
>>>
>>>         TokenStream tok = new StandardFilter(Version.LUCENE_43, src);
>>>         tok = new LowerCaseFilter(Version.LUCENE_43, tok);
>>>
>>>         return new TokenStreamComponents(src, tok) {
>>>             @Override
>>>             protected void setReader(final Reader reader) throws
>>> IOException {
>>>                 super.setReader(reader);
>>>             }
>>>         };
>>>     }
>>> }
>>>
>>> If, for example, you want to remove stop words from document before
>>> breaking it into n-grams, than you would need:
>>> reader(document) -> SomeTokenizer -> StopFilter -> NGramTokenFilter
>>>
>>>
>>>   Regards,
>>>     Ivan Krišto
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>
>>
>>
>> --
>> Malgorzata Urbanska (Gosia)
>> Graduate Assistant
>> Colorado State University
>
>
>
> --
> Malgorzata Urbanska (Gosia)
> Graduate Assistant
> Colorado State University



-- 
Malgorzata Urbanska (Gosia)
Graduate Assistant
Colorado State University

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: ngrams in Lucene 4.3.0

Reply via email to