Okay cool; I used distance of each vector to each centroid in the mapper.
> On Dec 1, 2013, at 10:41 AM, "Pat Ferrel (JIRA)" <[email protected]> wrote: > > > [ > https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836087#comment-13836087 > ] > > Pat Ferrel commented on MAHOUT-1030: > ------------------------------------ > > I hope Jeff can answer about normalized results, but I believe that had to do > with using the pdf in lieu of using the true distance to centroid. If the > true distance from the WPVW to the already calculated centroid is stored in > the WPVW, I don't believe Jeff's comment applies. > > He's calling this a regression because the distance was in the vector and now > is not. His proposed fix didn't work out because of the above comment. Again, > as I recall. > >> Regression: Clustered Points Should be WeightedPropertyVectorWritable not >> WeightedVectorWritable >> ------------------------------------------------------------------------------------------------ >> >> Key: MAHOUT-1030 >> URL: https://issues.apache.org/jira/browse/MAHOUT-1030 >> Project: Mahout >> Issue Type: Bug >> Components: Clustering, Integration >> Affects Versions: 0.7 >> Reporter: Jeff Eastman >> Assignee: Andrew Musselman >> Fix For: 1.0, 0.9 >> >> Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, >> MAHOUT-1030.patch >> >> >> Looks like this won't make it into this build. Pretty widespread impact on >> code and tests and I don't know which properties were implemented in the old >> version. I will create a JIRA and post my interim results. >>> On 6/8/12 12:21 PM, Jeff Eastman wrote: >>> That's a reversion that evidently got in when the new >>> ClusterClassificationDriver was introduced. It should be a pretty easy fix >>> and I will see if I can make the change before Paritosh cuts the release >>> bits tonight. >>> >>>> On 6/7/12 1:00 PM, Pat Ferrel wrote: >>>> It appears that in kmeans the clusteredPoints are now written as >>>> WeightedVectorWritable where in mahout 0.6 they were >>>> WeightedPropertyVectorWritable? This means that the distance from the >>>> centroid is no longer stored here? Why? I hope I'm wrong because that is >>>> not a welcome change. How is one to order clustered docs by distance from >>>> cluster centroid? >>>> >>>> I'm sure I could calculate the distance but that would mean looking up the >>>> centroid for the cluster id given in the above WeightedVectorWritable, >>>> which means iterating through all the clusters for each clustered doc. In >>>> my case the number of clusters could be fairly large. >>>> >>>> Am I missing something? > > > > -- > This message was sent by Atlassian JIRA > (v6.1#6144)
