I think I got it.
In the CollectionIndexer class, I have added the co-occurrence score to the
index document:
doc.add(new Field("score", collocation.getScore() + "",
Field.Store.YES, Field.Index.NOT_ANALYZED));
then in the CollectionSearcher, the scores can be retrieved:
d.get("score")
Is that correct ??
On Sun, Aug 22, 2010 at 2:47 PM, ahmed algohary <[email protected]>wrote:
> Thanks! It is exactly what I need. But, isn't there a way to get the
> matching score ?
>
> for example, "damaged" co-occurs with "shipment" with a probability = 0.4
> ??
>
>
> On Sun, Aug 22, 2010 at 5:35 AM, Ivan Provalov <[email protected]> wrote:
>
>> Ahmed,
>>
>> FYI, I updated the term collocations package I mentioned earlier with a
>> few fixes and changes which will make it work for Lucene 3.0.2. This may
>> help your task.
>>
>> See:
>> https://issues.apache.org/jira/browse/LUCENE-474
>>
>> Thanks,
>>
>> Ivan Provalov
>>
>>
>> --- On Sat, 8/21/10, Otis Gospodnetic <[email protected]> wrote:
>>
>> > From: Otis Gospodnetic <[email protected]>
>> > Subject: Re: Calculate Term Co-occurrence Matrix
>> > To: [email protected]
>> > Date: Saturday, August 21, 2010, 8:05 AM
>> > Ahmed,
>> >
>> > That's what that KPE (link in my previous email, below)
>> > will do for you. It's
>> > not open source at this time, but that is exactly one of
>> > the things it does. I
>> > think Mahout collocations stuff might work for you, too.
>> >
>> > Otis
>> > ----
>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> > Lucene ecosystem search :: http://search-lucene.com/
>> >
>> >
>> >
>> > ----- Original Message ----
>> > > From: ahmed algohary <[email protected]>
>> > > To: [email protected]
>> > > Sent: Sat, August 21, 2010 7:20:03 AM
>> > > Subject: Re: Calculate Term Co-occurrence Matrix
>> > >
>> > > Thanks for all your answers!
>> > >
>> > > it seems like I did not make my question clear.
>> > I have a text corpus and I
>> > > need to determine the pairs of words that occur
>> > together in many documents.
>> > > I need to do that to be able to measure the
>> > semantic proximity between
>> > > words. This method is expanded
>> > > here<http://forums.searchenginewatch.com/showthread.php?t=48>.
>> > > I hope to find some code that given a text
>> > corpus, generate all the words
>> > > pairs with their probability of occurring
>> > together.
>> > >
>> > >
>> > > On Sat, Aug 21, 2010 at 1:46 AM, Otis
>> > Gospodnetic <
>> > > [email protected]>
>> > wrote:
>> > >
>> > > > There is also a non-Mahout Key Phrase Extractor
>> > for Collocations, SIPs, and
>> > > > a
>> > > > few other things:
>> > > > http://sematext.com/products/key-phrase-extractor/index.html
>> > > >
>> > > > One of the demos that uses news data is at
>> > > > http://sematext.com/demo/kpe/index.html
>> > > >
>> > > > Otis
>> > > > ----
>> > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
>> > Nutch
>> > > > Lucene ecosystem search :: http://search-lucene.com/
>> > > >
>> > > >
>> > > >
>> > > > ----- Original Message ----
>> > > > > From: Grant Ingersoll <[email protected]>
>> > > > > To: [email protected]
>> > > > > Sent: Fri, August 20, 2010 8:52:17 AM
>> > > > > Subject: Re: Calculate Term
>> > Co-occurrence Matrix
>> > > > >
>> > > > > You might also be interested in
>> > Mahout's collocations package:
>> > > > >http://cwiki.apache.org/confluence/display/MAHOUT/Collocations
>> > > > >
>> > > > > -Grant
>> > > > > On Aug 19, 2010, at 11:39 AM,
>> > ahmed algohary wrote:
>> > > > >
>> > > > > > Hi all,
>> > > > > >
>> > > > > > I need to know if there is a
>> > Lucene plug-in or a Lucene-based API for
>> > > > > > calculating the term co-occurrence
>> > matrix for a given text corpus.
>> > > > > >
>> > > > > > Thanks!
>> > > > > >
>> > > > > > --
>> > > > > > Ahmed
>> > > > >
>> > > > > --------------------------
>> > > > > Grant Ingersoll
>> > > > > http://www.lucidimagination.com/
>> > > > >
>> > > > > Search the Lucene ecosystem
>> > using Solr/Lucene:
>> > > > >http://www.lucidimagination.com/search
>> > > > >
>> > > > >
>> > > > >
>> > ---------------------------------------------------------------------
>> > > > > To unsubscribe, e-mail: [email protected]
>> > > > > For additional commands, e-mail:
>> > [email protected]
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > ---------------------------------------------------------------------
>> > > > To unsubscribe, e-mail: [email protected]
>> > > > For additional commands, e-mail: [email protected]
>> > > >
>> > > >
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >
>> >
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>