Juan Francisco Contreras Gaitan wrote:
Hello,
I would like to do some clustering by using Hadoop and I found Mahout. I am
really impressed, but as a newbie I got stuck and I have several questions. The
idea is to do string clustering: I have properties values expressed as strings
of some resources, and I would like to aggregate these resources. I use Eclipse
as IDE, and I have two Mahout working projects, one with release version (0.1)
and the other one with SVN version. I am able to compile examples and to run
them on my own Hadoop cluster. I have focused on Synthetic Control Data example
using Canopy algorithm because of its similarity to my problem.
- on release version with default parameter values I get all the items on the
same cluster (C1), is it normal?
There was an issue with hadoop 0.19 & above running combiners both on
the map side and the reduce side which causes this behavior in the
released code. Your best bet would be to use the trunk version.
adil