Nah, its an analyzer. so you can just use termquery (fast: exact match). at query and index time it just maps stuff to a key... typically you would just put this in a separate field.
you can combine this with your edit distance query with a booleanquery, for example the edit distance can handle your le[o]minster just fine. I think this would be much better for you, i wouldnt abuse levenshtein for phonetics stuff, its not designed for that. On Mon, Jul 26, 2010 at 1:44 PM, <[email protected]> wrote: > Clearly you haven’t been in the Northeast much. Try “Worcester” vs. > “wuster”, or “Leominster” vs. “leminster”. It’s also likely to be a > challenge to come up with the right phonetics for any given proper location > name. It’s even worse in Britain, or countries where the phonetic rules > may be a hodgepodge of different colonial influences. > > > > That having been said, if there exists a “PhoneticQuery” object that does > all this using the automaton logic under the covers, I think it would be > worth a serious look. > > > > Karl > > > > > > *From:* ext Robert Muir [mailto:[email protected]] > *Sent:* Monday, July 26, 2010 1:24 PM > > *To:* [email protected] > *Subject:* Re: LevenshteinFilter proposal > > > > > > On Mon, Jul 26, 2010 at 1:13 PM, <[email protected]> wrote: > > What I want to capture is situations where people misspell things in > roughly a phonetic way. For example, “Tchaikovsky Avenue” might be > misspelled as “Chicovsky Avenue”. Modules that do phonetic mapping are > possible but you’d have to somehow generate a phonetic database of (say) > streetnames, worldwide. Good luck on getting hold of that kind of data > anywhere. ;-) In the absence of such data, an LD distance will have to do – > but it will almost certainly need to be greater than 2. > > I added this to 'TestPhoneticFilter' and it passes: assertAlgorithm(new > DoubleMetaphone(), false, "Tchaikovsky Chicovsky", new String[] { "XKFS", > "XKFS" }); > > > > So if you want to give me all your street names, i can sell you a phonetic > database, or you can use the filters in modules/analyzers/phonetic, which > have a bunch of different configurable algorithms :) > > > -- > Robert Muir > [email protected] > -- Robert Muir [email protected]
