Juan Francisco Contreras Gaitan wrote:
Hello,

I would like to do some clustering by using Hadoop and I found Mahout. I am 
really impressed, but as a newbie I got stuck and I have several questions. The 
idea is to do string clustering: I have properties values expressed as strings 
of some resources, and I would like to aggregate these resources. I use Eclipse 
as IDE, and I have two Mahout working projects, one with release version (0.1) 
and the other one with SVN version. I am able to compile examples and to run 
them on my own Hadoop cluster. I have focused on Synthetic Control Data example 
using Canopy algorithm because of its similarity to my problem.

- on release version with default parameter values I get all the items on the 
same cluster (C1), is it normal?
There was an issue with hadoop 0.19 & above running combiners both on the map side and the reduce side which causes this behavior in the released code. Your best bet would be to use the trunk version.

adil

Reply via email to