Re: LevenshteinFilter proposal

Robert Muir Mon, 26 Jul 2010 10:57:14 -0700

Nah, its an analyzer. so you can just use termquery (fast: exact match).
at query and index time it just maps stuff to a key... typically you would
just put this in a separate field.


you can combine this with your edit distance query with a booleanquery, for
example the edit distance can handle your le[o]minster just fine.

I think this would be much better for you, i wouldnt abuse levenshtein for
phonetics stuff, its not designed for that.

On Mon, Jul 26, 2010 at 1:44 PM, <[email protected]> wrote:

>  Clearly you haven’t been in the Northeast much.  Try “Worcester” vs.
> “wuster”, or “Leominster” vs. “leminster”.  It’s also likely to be a
> challenge to come up with the right phonetics for any given proper location
> name.   It’s even worse in Britain, or countries where the phonetic rules
> may be a hodgepodge of different colonial influences.
>
>
>
> That having been said, if there exists a “PhoneticQuery” object that does
> all this using the automaton logic under the covers, I think it would be
> worth a serious look.
>
>
>
> Karl
>
>
>
>
>
> *From:* ext Robert Muir [mailto:[email protected]]
> *Sent:* Monday, July 26, 2010 1:24 PM
>
> *To:* [email protected]
> *Subject:* Re: LevenshteinFilter proposal
>
>
>
>
>
> On Mon, Jul 26, 2010 at 1:13 PM, <[email protected]> wrote:
>
> What I want to capture is situations where people misspell things in
> roughly a phonetic way.  For example, “Tchaikovsky Avenue” might be
> misspelled as “Chicovsky Avenue”.  Modules that do phonetic mapping are
> possible but you’d have to somehow generate a phonetic database of (say)
> streetnames, worldwide.  Good luck on getting hold of that kind of data
> anywhere. ;-)  In the absence of such data, an LD distance will have to do –
> but it will almost certainly need to be greater than 2.
>
> I added this to 'TestPhoneticFilter' and it passes:  assertAlgorithm(new
> DoubleMetaphone(), false, "Tchaikovsky Chicovsky", new String[] { "XKFS",
> "XKFS" });
>
>
>
> So if you want to give me all your street names, i can sell you a phonetic
> database, or you can use the filters in modules/analyzers/phonetic, which
> have a bunch of different configurable algorithms :)
>
>
> --
> Robert Muir
> [email protected]
>



-- 
Robert Muir
[email protected]

Re: LevenshteinFilter proposal

Reply via email to