Had a look at it sometime ago, but admitedly skimmed over it. Just read it 
again, looks good, allows dimension reduction with ease, and hence looks 
scalable.

tks

Paul




________________________________
From: Grant Ingersoll <gsing...@apache.org>
To: mahout-user@lucene.apache.org
Sent: Wednesday, 24 June, 2009 12:34:46
Subject: Re: mahout PLSI (with some lucene, thrown in)

Random FYI: http://code.google.com/p/semanticvectors/ came up on the Lucene 
mailing list yesterday and it sounds interesting, plus BSD license...

-Grant

On Jun 23, 2009, at 7:56 PM, Paul Jones wrote:

> Yup, I see that wordnet has also been "ported" to a lucene index, and hence 
> pulling the hyponyms works great.
> 
> tks
> 
> Paul
> 
> 
> 
> 
> ________________________________
> From: Tommy Chheng <to...@peoplejar.com>
> To: mahout-user@lucene.apache.org
> Sent: Tuesday, 23 June, 2009 23:19:25
> Subject: Re: mahout PLSI (with some lucene, thrown in)
> 
> Have you looked at WordNet to get the hypohyms?
> 
> Tommy
> 
> On Jun 23, 2009, at 3:09 PM, Paul Jones wrote:
> 
>> Okay, have seen the difficulty (apart from the maths :-)).
>> 
>> I guess "similar" can mean many things, i.e hypohyms, but also words such as 
>> hot...cold are also "related", hence to solve my little problem I am 
>> wondering if there is a easier way, i.e to use things like existing hyponyms 
>> relations which exist (wordnet and the like) , and/or if they do not then I 
>> guess using something similar to a "google distance measure" may help in 
>> "adding" new words to the system....
>> 
>> Paul
>> 
>> 
>> 
>> 
>> ________________________________
>> From: Ted Dunning <ted.dunn...@gmail.com>
>> To: mahout-user@lucene.apache.org
>> Sent: Tuesday, 23 June, 2009 18:00:12
>> Subject: Re: mahout PLSI (with some lucene, thrown in)
>> 
>> Yes.  This can be done.  It isn't necessarily real simple to do.
>> 
>> See http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.7275 for an
>> old (but still pretty good) example.
>> 
>> On Tue, Jun 23, 2009 at 6:45 AM, Paul Jones <paul_jone...@yahoo.co.uk>wrote:
>> 
>>> Imagine we have crawled 100K webpages, and we have 100 pages which show
>>> "red" and 100 which show "crimson" and then 100 which show both "red and
>>> crimson" is there a way to deduce that there maybe (albeit weak)
>>> relationship between red AND crimson. Of course we can pre-seed this info,
>>> which then gets weighted by actual results.
>>> 
>> 
>> 
>> 
> 
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search


      

Reply via email to