Hi to one and all

First time on this list, have read through the wiki, faq and other docs, but 
before I dived further into Mahout I had a few questions or should I say 
clarifications.
I am looking for a system which would allow me to:

1. Take a set of words
2. Build clusters of these words, i.e work out the semantic relationship 
between these (I guess I could use wordnet as a starter) words. i.e 
inter-relationships
3. Once clusters have been formed of words, also work out relationship between 
the clusters themselves.

so in essence I could work out that red was similiar to crimson, and hence a 
search on red would produce docs with crimson in them even though red was not 
mentioned.

would mahout work here?

Of course prior to this, there is the problem of cleaning up the data, i.e 
stemming etc.

Now I have read several detailed papers on clustering, ranking, etc, and of 
course some algos are better than others, but to me a platform like Mahout 
seems interesting since you can deploy the existing ones in the system, and 
also later on add others.

Looking at the algorithms it seems as if LSI (PLSI) has not been implemented as 
yet, if so which other algo would "suffice" in this case. Admitedley my 
knowledge of algos is poor to say the least :-). Also where would (if it does) 
Lucene fit in, would it be used to search the results after the algo's had been 
applied ? since it seems as if Lucene just uses a weighting system to create 
the index, or can Mahout do it all.

As you can see confused, but this is my first pass at this system.

tks

Paul

P.S are any of the algo's feedback algo's, i.e so that someone could inprove 
results using user feedback.


      

Reply via email to