Thanks! It is exactly what I need. But, isn't there a way to get the matching score ?
for example, "damaged" co-occurs with "shipment" with a probability = 0.4 ?? On Sun, Aug 22, 2010 at 5:35 AM, Ivan Provalov <iprov...@yahoo.com> wrote: > Ahmed, > > FYI, I updated the term collocations package I mentioned earlier with a few > fixes and changes which will make it work for Lucene 3.0.2. This may help > your task. > > See: > https://issues.apache.org/jira/browse/LUCENE-474 > > Thanks, > > Ivan Provalov > > > --- On Sat, 8/21/10, Otis Gospodnetic <otis_gospodne...@yahoo.com> wrote: > > > From: Otis Gospodnetic <otis_gospodne...@yahoo.com> > > Subject: Re: Calculate Term Co-occurrence Matrix > > To: java-user@lucene.apache.org > > Date: Saturday, August 21, 2010, 8:05 AM > > Ahmed, > > > > That's what that KPE (link in my previous email, below) > > will do for you. It's > > not open source at this time, but that is exactly one of > > the things it does. I > > think Mahout collocations stuff might work for you, too. > > > > Otis > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > ----- Original Message ---- > > > From: ahmed algohary <algoharya...@gmail.com> > > > To: java-user@lucene.apache.org > > > Sent: Sat, August 21, 2010 7:20:03 AM > > > Subject: Re: Calculate Term Co-occurrence Matrix > > > > > > Thanks for all your answers! > > > > > > it seems like I did not make my question clear. > > I have a text corpus and I > > > need to determine the pairs of words that occur > > together in many documents. > > > I need to do that to be able to measure the > > semantic proximity between > > > words. This method is expanded > > > here<http://forums.searchenginewatch.com/showthread.php?t=48>. > > > I hope to find some code that given a text > > corpus, generate all the words > > > pairs with their probability of occurring > > together. > > > > > > > > > On Sat, Aug 21, 2010 at 1:46 AM, Otis > > Gospodnetic < > > > otis_gospodne...@yahoo.com> > > wrote: > > > > > > > There is also a non-Mahout Key Phrase Extractor > > for Collocations, SIPs, and > > > > a > > > > few other things: > > > > http://sematext.com/products/key-phrase-extractor/index.html > > > > > > > > One of the demos that uses news data is at > > > > http://sematext.com/demo/kpe/index.html > > > > > > > > Otis > > > > ---- > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - > > Nutch > > > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > > > > > > > > > ----- Original Message ---- > > > > > From: Grant Ingersoll <gsing...@apache.org> > > > > > To: java-user@lucene.apache.org > > > > > Sent: Fri, August 20, 2010 8:52:17 AM > > > > > Subject: Re: Calculate Term > > Co-occurrence Matrix > > > > > > > > > > You might also be interested in > > Mahout's collocations package: > > > > >http://cwiki.apache.org/confluence/display/MAHOUT/Collocations > > > > > > > > > > -Grant > > > > > On Aug 19, 2010, at 11:39 AM, > > ahmed algohary wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > I need to know if there is a > > Lucene plug-in or a Lucene-based API for > > > > > > calculating the term co-occurrence > > matrix for a given text corpus. > > > > > > > > > > > > Thanks! > > > > > > > > > > > > -- > > > > > > Ahmed > > > > > > > > > > -------------------------- > > > > > Grant Ingersoll > > > > > http://www.lucidimagination.com/ > > > > > > > > > > Search the Lucene ecosystem > > using Solr/Lucene: > > > > >http://www.lucidimagination.com/search > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > > For additional commands, e-mail: > > java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >