Thanks for that piece of advice. I ended up passing my snowballAnalyzer and standardAnalyzers as parameters to ShingleFilterWrappers and processing the outputs via a TermVectorMapper.
It seems to work quite well. -----Original Message----- From: Robert Muir [mailto:rcm...@gmail.com] Sent: 05 Sep 2012 01 53 To: java-user@lucene.apache.org Subject: Re: Using a Lucene ShingleFilter to extract frequencies of bigrams in Lucene On Tue, Sep 4, 2012 at 12:37 PM, Martin O'Shea <app...@dsl.pipex.com> wrote: > > Does anyone know if this can be used in conjunction with other > analyzers to return the frequencies of the bigrams or trigrams found, e.g.: > > > > "please divide this please divide sentence into shingles" > > > > Would return 2 for "please divide"? > > > > I'm currently using Lucene 3.0.2 to extract frequencies of unigrams > from a string using a combination of a TermVectorMapper and > Standard/Snowball analyzers. > > > > I should add that my strings are built up from a database and then > indexed by Lucene in memory and are not persisted beyond this. Use of > other products like Solr is not intended. > The bigrams etc generated by shingles are terms just like the unigrams. So you can wrap any other analyzer with a ShingleAnalyzerWrapper if you want the shingles. If you just want to use Lucene's analyzers to tokenize the text and compute within-document frequencies for a one-off purpose, I think indexing and creating term vectors could be overkill: you could just consume the tokens from the Analyzer and make a hashmap or whatever you need... There are examples in the org.apache.lucene.analysis package javadocs. -- lucidworks.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org