Ok, so if you wrap your vector data in a NamedVector: NV(<id1>,[1,2,3,4,5,6]) NV(<id2>,[1,2,3,4,5,6]) NV(<id3>,[2,3,3,4,5,7])
And keep another index file: <id1>, (BOB, Chicago) <id2>, (PHIL, Miami) <id3>, (Cindy, NY) Then what you will get out of clustering will be: NV(<id1>,[1,2,3,4,5,6]) is Cluster 0 NV(<id2>,[1,2,3,4,5,6]) is Cluster 0 NV(<id3>,[2,3,3,4,5,7]) is Cluster 1 Finally you can join them back together to get: 1,2,3,4,5,6,BOB, Chicago is Cluster 0 1,2,3,4,5,6,PHIL, Miami is Cluster 0 2,3,3,4,5,7,Cindy, NY is Cluster 1 -----Original Message----- From: dbg [mailto:[email protected]] Sent: Friday, July 15, 2011 6:46 AM To: [email protected] Subject: Re: Fields needed after clustering but not used within Mahout To elaborate further... The data I am clustering is: 1,2,3,4,5,6,BOB,Chicago 1,2,3,4,5,6,PHIL,Miami 2,3,3,4,5,7,Cindy,NY The data I vector and send through Mahout/Kmeans is: 1,2,3,4,5,6 1,2,3,4,5,6 2,3,3,4,5,7 That data I get back is: 1,2,3,4,5,6 is Cluster 0 1,2,3,4,5,6 is Cluster 0 2,3,3,4,5,7 is Cluster 1 The data I want is: 1,2,3,4,5,6,BOB, Chicago is Cluster 0 1,2,3,4,5,6,PHIL, Miami is Cluster 0 2,3,3,4,5,7,Cindy, NY is Cluster 1 Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Fields-needed-after-clustering-but-not-used-within-Mahout-tp3170297p3171977.html Sent from the Mahout Developer List mailing list archive at Nabble.com.
