if you didn't have to change the index then you haven't got all the factors needed to 
do it well. terms can't cross sentence boundaries and the index doesn't store sentence 
boundaries.

Herb...

-----Original Message-----
From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]
Sent: Friday, November 14, 2003 1:14 PM
To: Lucene Users List
Subject: inter-term correlation [was Re: Vector Space Model in Lucene?]


Incorporating inter-term correlation into Lucene isn't that hard; I've 
done it.  Nor is it incompatible with the vector-space model.  I'm not 
happy with the specific correlation metric that I picked, which is why 
I'm not eager to generally release the code I wrote, but I think that 
the basic mechanism that I came up with (query expansion via correlated 
terms, where the added terms were boosted according to the strength of 
the correlation) is fairly sound.  And I didn't need any changes to 
Lucene to do this.

You can get some details on the specific mechanism that I used here, if 
you're interested:

http://www.ics.uci.edu/~jmadden/research/index.html

(and go down to "Fuzzy Term Expansion and Document Reweighting", about 
halfway down.)

If you decide that my ideas are interesting enough that you want to 
have a look at my code, let me know, and perhaps we can work something 
out.

Regards,

Joshua O'Madadhain

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to