RE: Fields needed after clustering but not used within Mahout

Jeff Eastman Fri, 15 Jul 2011 10:06:27 -0700

Ok, so if you wrap your vector data in a NamedVector:
NV(<id1>,[1,2,3,4,5,6])
NV(<id2>,[1,2,3,4,5,6])
NV(<id3>,[2,3,3,4,5,7])


And keep another index file:
<id1>, (BOB, Chicago)
<id2>, (PHIL, Miami)
<id3>, (Cindy, NY)

Then what you will get out of clustering will be:
NV(<id1>,[1,2,3,4,5,6])      is Cluster 0
NV(<id2>,[1,2,3,4,5,6])      is Cluster 0
NV(<id3>,[2,3,3,4,5,7])      is Cluster 1

Finally you can join them back together to get:
1,2,3,4,5,6,BOB, Chicago     is Cluster 0
1,2,3,4,5,6,PHIL, Miami      is Cluster 0
2,3,3,4,5,7,Cindy, NY        is Cluster 1

-----Original Message-----
From: dbg [mailto:[email protected]] 
Sent: Friday, July 15, 2011 6:46 AM
To: [email protected]
Subject: Re: Fields needed after clustering but not used within Mahout

To elaborate further...

The data I am clustering is:
1,2,3,4,5,6,BOB,Chicago
1,2,3,4,5,6,PHIL,Miami
2,3,3,4,5,7,Cindy,NY

The data I vector and send through Mahout/Kmeans is:
1,2,3,4,5,6
1,2,3,4,5,6
2,3,3,4,5,7

That data I get back is:
1,2,3,4,5,6      is Cluster 0
1,2,3,4,5,6      is Cluster 0
2,3,3,4,5,7      is Cluster 1


The data I want is:

1,2,3,4,5,6,BOB, Chicago   is Cluster 0
1,2,3,4,5,6,PHIL, Miami     is Cluster 0
2,3,3,4,5,7,Cindy, NY        is Cluster 1

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fields-needed-after-clustering-but-not-used-within-Mahout-tp3170297p3171977.html
Sent from the Mahout Developer List mailing list archive at Nabble.com.

RE: Fields needed after clustering but not used within Mahout

Reply via email to