-g-
[*] why do you use the label ``fuzzy''? It has nothing to do with fuzzy logic or fuzzy IR, I guess.
Frank Burough wrote:
I have seen some interesting work done on storing DNA sequence as a set of common patterns with unique sequence between them. If one uses an analyzer to break sequence into its set of patterns and unique sequence then Lucene could be used to search for exact pattern matches. I know of only one sequence search tool that was based on this approach. I don't know if it ever left the lab and made it into the mainstream. If I have time I will explore this a bit.
Frank Burough
-----Original Message-----
From: Leo Galambos [mailto:[EMAIL PROTECTED] Sent: Thursday, June 05, 2003 5:55 PM
To: Lucene Users List
Subject: Re: String similarity search vs. typcial IR application...
AFAIK Lucene is not able to look DNA strings up effectively. You would use DASG+Lev (see my previous post - 05/30/2003 1916CEST).
-g-
Jim Hargrave wrote:
Our application is a string similarity searcher where thequery is an
input string and we want to find all "fuzzy" variants of theinput string in the DB. The Score is basically dice's coefficient: 2C/Q+D, where C is the number of terms (n-grams) in common, Q is the number of unique query terms and D is the number of unique document terms. Our documents will be sentences.
I know Lucene has a fuzzy search capability - but I assumethis would
be very slow since it must search through the entire termlist to find candidates.
In order to do the calculation I will need to have 'C' - thenumber of
terms in common between query and document. Is there an APIthat I can call to get this info? Any hints on what it will take to modify Lucene to handle these kinds of queries?
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]