Juan Francisco Contreras Gaitan wrote:
Hello,

I would like to do some clustering by using Hadoop and I found Mahout. I am 
really impressed, but as a newbie I got stuck and I have several questions. The 
idea is to do string clustering: I have properties values expressed as strings 
of some resources, and I would like to aggregate these resources. I use Eclipse 
as IDE, and I have two Mahout working projects, one with release version (0.1) 
and the other one with SVN version. I am able to compile examples and to run 
them on my own Hadoop cluster. I have focused on Synthetic Control Data example 
using Canopy algorithm because of its similarity to my problem.

- on release version with default parameter values I get all the items on the 
same cluster (C1), is it normal?
Are you running the Synthetic Control example data here? That example - I just ran it on trunk - should produce 6 clusters in one file. It is binary encoded though, and difficult to interpret in textual representation. If you search for the string 'SparseVector' in the canopies/part-0000 file you should see six instances.
- on SVN version I don't have a readable output because there is no implemented 
OutputDriver. If I use the same as release version, I got exceptions (I think 
that format has changed between releases, for example using '{' symbol instead 
of '[')
The output formats of all the clustering routines are now sequence files which are binary encoded. The old OutputDriver won't handle it.
- I use string values instead of double values. I have implemented my own 
string distance that returns a double when parameters are string, but I think 
that Mahout Vectors are implemented just to store double values. Is there any 
chance to use string values?
Vectors are double only and you will need to massage your data into numeric format to use out of the box clustering. Is there a way to convert your property values into doubles?
I would be very grateful if anyone could help me.
I'm going to be working on converting clustering to Hadoop 0.20 in the next weeks. Let's continue our dialog.
Thank you very much in advance.

Regards,
jfcg

_________________________________________________________________
¿Quieres los nuevos emoticonos en 3D? ¡Descárgatelos gratis!
http://www.vivelive.com/emoticonos3d/index2.html

Reply via email to