Thanks for that piece of advice.

 I ended up passing my snowballAnalyzer and standardAnalyzers as parameters to 
ShingleFilterWrappers and processing the outputs via a TermVectorMapper. 

It seems to work quite well.

-----Original Message-----
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: 05 Sep 2012 01 53
To: java-user@lucene.apache.org
Subject: Re: Using a Lucene ShingleFilter to extract frequencies of bigrams in 
Lucene

On Tue, Sep 4, 2012 at 12:37 PM, Martin O'Shea <app...@dsl.pipex.com> wrote:
>
> Does anyone know if this can be used in conjunction with other 
> analyzers to return the frequencies of the bigrams or trigrams found, e.g.:
>
>
>
>     "please divide this please divide sentence into shingles"
>
>
>
> Would return 2 for "please divide"?
>
>
>
> I'm currently using Lucene 3.0.2 to extract frequencies of unigrams 
> from a string using a combination of a TermVectorMapper and 
> Standard/Snowball analyzers.
>
>
>
> I should add that my strings are built up from a database and then 
> indexed by Lucene in memory and are not persisted beyond this. Use of 
> other products like Solr is not intended.
>

The bigrams etc generated by shingles are terms just like the unigrams. So you 
can wrap any other analyzer with a ShingleAnalyzerWrapper if you want the 
shingles.

If you just want to use Lucene's analyzers to tokenize the text and compute 
within-document frequencies for a one-off purpose, I think indexing and 
creating term vectors could be overkill: you could just consume the tokens from 
the Analyzer and make a hashmap or whatever you need...

There are examples in the org.apache.lucene.analysis package javadocs.

--
lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to