Re: Lucene and Latent Semantic Indexing

Tarjei Lægreid Tue, 22 Nov 2005 04:27:13 -0800

Hi Andy,

I am also very interested in such approaches. I have tried a hack to
simulate the effects of LSI in a Lucene index. What I did was, as you
suggested to extract the term frequencies from the index, constructed a
term/document matrix, and performed SVD on the matrix. Then I multiplied the
resulting values by a constant factor to simulate term frequencies in the
LSI space (that is, I created a new field "lsi" in the documents and added
the words with their corresponding frequencies). However this is a pretty
nasty hack, and I would appreciate if anyone knows a good way of applying
LSI to Lucene.


Are there any plans of including LSI as a Lucene feature in the future?


Regards,
Tarjei

On 11/15/05, Andy Liu <[EMAIL PROTECTED]> wrote:
>
> I'm currently experimenting with latent semantic indexing techniques and
> Lucene. I need to extract term frequencies from a Lucene index and
> construct
> a document/term matrix, then subsequently perform some mathematical
> algorithms on this matrix which produces float and potentially negative
> term
> frequency values. Extracting the tf's from the Lucene index is easy. The
> hard part is importing the modified tf's back into the index, since in
> Lucene, tf's are stored as integer values.
>
> Anybody that knows the Lucene codebase well have any tips? Has anybody
> even
> tried performing LSI on a Lucene index?
>
> Andy
>
>

Re: Lucene and Latent Semantic Indexing

Reply via email to