Have you looked at WordNet to get the hypohyms?
Tommy
On Jun 23, 2009, at 3:09 PM, Paul Jones wrote:
Okay, have seen the difficulty (apart from the maths :-)).
I guess "similar" can mean many things, i.e hypohyms, but also words
such as hot...cold are also "related", hence to solve my little
problem I am wondering if there is a easier way, i.e to use things
like existing hyponyms relations which exist (wordnet and the
like) , and/or if they do not then I guess using something similar
to a "google distance measure" may help in "adding" new words to the
system....
Paul
________________________________
From: Ted Dunning <ted.dunn...@gmail.com>
To: mahout-user@lucene.apache.org
Sent: Tuesday, 23 June, 2009 18:00:12
Subject: Re: mahout PLSI (with some lucene, thrown in)
Yes. This can be done. It isn't necessarily real simple to do.
See http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.7275
for an
old (but still pretty good) example.
On Tue, Jun 23, 2009 at 6:45 AM, Paul Jones
<paul_jone...@yahoo.co.uk>wrote:
Imagine we have crawled 100K webpages, and we have 100 pages which
show
"red" and 100 which show "crimson" and then 100 which show both
"red and
crimson" is there a way to deduce that there maybe (albeit weak)
relationship between red AND crimson. Of course we can pre-seed
this info,
which then gets weighted by actual results.