Joshua
On Friday, Nov 14, 2003, at 10:13 US/Pacific, Chong, Herb wrote:
if you didn't have to change the index then you haven't got all the factors needed to do it well. terms can't cross sentence boundaries and the index doesn't store sentence boundaries.[EMAIL PROTECTED] Per Obscurius...www.ics.uci.edu/~jmadden
Herb...
-----Original Message----- From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED] Sent: Friday, November 14, 2003 1:14 PM To: Lucene Users List Subject: inter-term correlation [was Re: Vector Space Model in Lucene?]
Incorporating inter-term correlation into Lucene isn't that hard; I've done it. Nor is it incompatible with the vector-space model. I'm not happy with the specific correlation metric that I picked, which is why I'm not eager to generally release the code I wrote, but I think that the basic mechanism that I came up with (query expansion via correlated terms, where the added terms were boosted according to the strength of the correlation) is fairly sound. And I didn't need any changes to Lucene to do this.
You can get some details on the specific mechanism that I used here, if you're interested:
http://www.ics.uci.edu/~jmadden/research/index.html
(and go down to "Fuzzy Term Expansion and Document Reweighting", about halfway down.)
If you decide that my ideas are interesting enough that you want to have a look at my code, let me know, and perhaps we can work something out.
Regards,
Joshua O'Madadhain
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
