Re: Multiple Analyzers

Erik Hatcher Mon, 31 Oct 2005 06:55:50 -0800

Yes, you can certainly do all three in one shot. Look at the sourcecode to StandardAnalyzer, build a custom analyzer that used it's coretokenization, then filtered through the SnowballFilter, and then acustom soundex filter (or metaphone or similar). That would make forsome mighty fuzzy searches, but I guess that is your goal.

A custom analyzer needs to be written, along with the custom soundexfilter, but should only be a handful of lines of code. Use thesource code of Lucene in Action as a basis if you'd like.


    Erik




On 31 Oct 2005, at 09:22, [EMAIL PROTECTED] wrote:

Thanks Erik.  So you're saying that my approach won't work, right?  I
understand your recommendations.  Is there a way that I can just

incorporate Stem, Soundex, and Standard into one search. In otherwords,

don't toggle anything.  Just index using custom analyzer that contains

Stem, Soundex, and Standard analyzers at once. And search usingthe customanalyzer that has Stem, Soundex, and Standard. I can live withjust havingall on compared to only having one of the three. Please, advise.Thanks

again for your swift responses.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniel Clark, Senior Consultant
Sybase Federal Professional Services
6550 Rock Spring Drive, Suite 800
Bethesda, MD  20817
Office - (301) 896-1103
Office Fax - (301) 896-1604
Mobile - (703) 403-0340
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



             Erik Hatcher
             <[EMAIL PROTECTED]

utions.com> To

                                       java-user@lucene.apache.org

10/30/200507:50 cc

PM

Subject

                                       Re: Multiple Analyzers
             Please respond to
             [EMAIL PROTECTED]
                apache.org








On 30 Oct 2005, at 11:03, [EMAIL PROTECTED] wrote:

I am implementing a search that allows the user to toggle the
Soundex and
Stem functionality on and off.  When the Soundex or Sten functions
are not
on, the search will use the StandardAnalyzer as default.  I
implemented the
Soundex (SoundexAnalyzer from book Lucene In Action) and Stem
(SnowballAnalyzer) analyzer successfully individually.  They only
work if I
use the analyzers during both indexing and searching.  The problem
I have
is that I seem to have to use one of the three analyzers or
nothing.  I
heven't been able to combine them.  The following may help better to
understand what I am trying to achieve.  Any help would be greatly
appreciated.

Indexing
========
Index using recursive TokenStream object.

public TokenStream tokenStream(String fieldName, Reader reader) {
      TokenStream result = new SoundexFilter (
                        new SnowballFilter (
                           new StopFilter (

 new
LowerCaseFilter (

new StandardFilter (
new StandardTokenizer(reader))),

StandardAnalyzer.STOP_WORDS)));
      return result;
}


So you're using a single analyzer for all fields during indexing?

Searching
==========
If user selects Soundex function, use SoundexAnalyzer in query.
Else if user selects Stem option, use SnowballAnalyzer in query.
Else use StandardAnalyzer.


But different analyzers for searching.... you have to be quite
careful about this sort of thing.  And in this particular case its
tricky.  You can't stem and soundex during indexing and then
selectively turn those features off without having indexed the
various combinations you want available during search.

One way to address this is to index the same text into multiple
fields with each using a different analysis process.  Then for
search, line up the query type (stemming, soundex, standard) with the
proper field.  (side note, this gets even more fun if you're using
QueryParser and the user enters in "field:term"!).

Another option is to index into multiple indexes - one for each type
of analysis.  I use this myself for case sensitive and insensitive
searches.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Multiple Analyzers

Reply via email to