Thanks Eric, Sorry about the personal post. Groupwise must not be posting as it should
- I see it locally but must not have gone out to the mailing list.
>From your description I may have no choice but to hack a custom version of Lucene. I
>do think that a "string edit distance" version of PhraseQuery would be benificial. If
>you break your words into character ngrams it would allow you to search languages
>which have no easy stemming algorithms or word boundries (like Thai, Cambodian,
>Laotion etc..). There are some ngram based IR systems out there that show this works
>pretty good for English at least. Since we are only interested in key word matching
>it does a fair job for the languages we have tried.
If anybody else has an idea that would allow me to modify PhraseQuery to do a full
"String edit distance" search I would appreciate it.
Jim Hargrave
>>> "Erik Hatcher" <[EMAIL PROTECTED]> 01/08/04 01:43PM >>>
On Jan 7, 2004, at 3:54 PM, Jim Hargrave wrote:
> Looks like I will have to implement my own PhraseQuery that uses a
> standard string edit distance measure. What is the easiest way to do
> this? Should I override PhraseQuery - then override the
> SloppyPhraseScorer? I have my own query parser so I can make any
> adjustments needed when building aquery.
Probably best to keep this on the lucene-user e-mail list, but it is
non-trivial to implement a custom Query. While PhraseQuery itself can
be extended, there are several pieces it uses which are currently
scoped at package visibility level only.
Even if you are using the built-in QueryParser, you can override the
method that constructs the PhraseQuery.
> BTW: We have implemented a multilingual key word in context
> application that provides exact, stemmed and fuzzy search for ANY
> language. Well we will have fuzzy search when I finish these
> modifications. Lucene rules!
>
Nice!
Erik