Thanks Eric, Sorry about the personal post. Groupwise must not be posting as it should 
- I see it locally but must not have gone out to the mailing list. 
 
>From your description I may have no choice but to hack a custom version of Lucene. I 
>do think that a "string edit distance" version of PhraseQuery would be benificial. If 
>you break your words into character ngrams it would allow you to search languages 
>which have no easy stemming algorithms or word boundries (like Thai, Cambodian, 
>Laotion etc..). There are some ngram based IR systems out there that show this works 
>pretty good for English at least. Since we are only interested in key word matching 
>it does a fair job for the languages we have tried.
 
If anybody else has an idea that would allow me to modify PhraseQuery to do a full 
"String edit distance" search I would appreciate it. 
 
Jim Hargrave

>>> "Erik Hatcher" <[EMAIL PROTECTED]> 01/08/04 01:43PM >>>
On Jan 7, 2004, at 3:54 PM, Jim Hargrave wrote:
> Looks like I will have to implement my own PhraseQuery that uses a 
> standard string edit distance measure. What is the easiest way to do 
> this? Should I override PhraseQuery - then override the 
> SloppyPhraseScorer? I have my own query parser so I can make any 
> adjustments needed when building aquery.

Probably best to keep this on the lucene-user e-mail list, but it is 
non-trivial to implement a custom Query.   While PhraseQuery itself can 
be extended, there are several pieces it uses which are currently 
scoped at package visibility level only.

Even if you are using the built-in QueryParser, you can override the 
method that constructs the PhraseQuery.

>  BTW: We have implemented a multilingual key word in context 
> application that provides exact, stemmed and fuzzy search for ANY 
> language. Well we will have fuzzy search when I finish these 
> modifications. Lucene rules!
>

Nice!

    Erik




Reply via email to