Re: String clustering and other newbie questions

Adil Aijaz Fri, 28 Aug 2009 09:41:52 -0700

Juan Francisco Contreras Gaitan wrote:

Hello,


I would like to do some clustering by using Hadoop and I found Mahout. I am 
really impressed, but as a newbie I got stuck and I have several questions. The 
idea is to do string clustering: I have properties values expressed as strings 
of some resources, and I would like to aggregate these resources. I use Eclipse 
as IDE, and I have two Mahout working projects, one with release version (0.1) 
and the other one with SVN version. I am able to compile examples and to run 
them on my own Hadoop cluster. I have focused on Synthetic Control Data example 
using Canopy algorithm because of its similarity to my problem.

- on release version with default parameter values I get all the items on the 
same cluster (C1), is it normal?

There was an issue with hadoop 0.19 & above running combiners both onthe map side and the reduce side which causes this behavior in thereleased code. Your best bet would be to use the trunk version.


adil

Re: String clustering and other newbie questions

Reply via email to