String clustering and other newbie questions

Juan Francisco Contreras Gaitan Fri, 28 Aug 2009 08:27:37 -0700

Hello,

I would like to do some clustering by using Hadoop and I found Mahout. I am 
really impressed, but as a newbie I got stuck and I have several questions. The 
idea is to do string clustering: I have properties values expressed as strings 
of some resources, and I would like to aggregate these resources. I use Eclipse 
as IDE, and I have two Mahout working projects, one with release version (0.1) 
and the other one with SVN version. I am able to compile examples and to run 
them on my own Hadoop cluster. I have focused on Synthetic Control Data example 
using Canopy algorithm because of its similarity to my problem.


- on release version with default parameter values I get all the items on the 
same cluster (C1), is it normal?
- on SVN version I don't have a readable output because there is no implemented 
OutputDriver. If I use the same as release version, I got exceptions (I think 
that format has changed between releases, for example using '{' symbol instead 
of '[')
- I use string values instead of double values. I have implemented my own 
string distance that returns a double when parameters are string, but I think 
that Mahout Vectors are implemented just to store double values. Is there any 
chance to use string values?

I would be very grateful if anyone could help me.

Thank you very much in advance.

Regards,
jfcg

_________________________________________________________________
¿Quieres los nuevos emoticonos en 3D? ¡Descárgatelos gratis!
http://www.vivelive.com/emoticonos3d/index2.html

String clustering and other newbie questions

Reply via email to