Okay cool; I used distance of each vector to each centroid in the mapper.

> On Dec 1, 2013, at 10:41 AM, "Pat Ferrel (JIRA)" <[email protected]> wrote:
> 
> 
>    [ 
> https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836087#comment-13836087
>  ] 
> 
> Pat Ferrel commented on MAHOUT-1030:
> ------------------------------------
> 
> I hope Jeff can answer about normalized results, but I believe that had to do 
> with using the pdf in lieu of using the true distance to centroid. If the 
> true distance from the WPVW to the already calculated centroid is stored in 
> the WPVW, I don't believe Jeff's comment applies. 
> 
> He's calling this a regression because the distance was in the vector and now 
> is not. His proposed fix didn't work out because of the above comment. Again, 
> as I recall.
> 
>> Regression: Clustered Points Should be WeightedPropertyVectorWritable not 
>> WeightedVectorWritable
>> ------------------------------------------------------------------------------------------------
>> 
>>                Key: MAHOUT-1030
>>                URL: https://issues.apache.org/jira/browse/MAHOUT-1030
>>            Project: Mahout
>>         Issue Type: Bug
>>         Components: Clustering, Integration
>>   Affects Versions: 0.7
>>           Reporter: Jeff Eastman
>>           Assignee: Andrew Musselman
>>            Fix For: 1.0, 0.9
>> 
>>        Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, 
>> MAHOUT-1030.patch
>> 
>> 
>> Looks like this won't make it into this build. Pretty widespread impact on 
>> code and tests and I don't know which properties were implemented in the old 
>> version. I will create a JIRA and post my interim results.
>>> On 6/8/12 12:21 PM, Jeff Eastman wrote:
>>> That's a reversion that evidently got in when the new 
>>> ClusterClassificationDriver was introduced. It should be a pretty easy fix 
>>> and I will see if I can make the change before Paritosh cuts the release 
>>> bits tonight.
>>> 
>>>> On 6/7/12 1:00 PM, Pat Ferrel wrote:
>>>> It appears that in kmeans the clusteredPoints are now written as 
>>>> WeightedVectorWritable where in mahout 0.6 they were 
>>>> WeightedPropertyVectorWritable? This means that the distance from the 
>>>> centroid is no longer stored here? Why? I hope I'm wrong because that is 
>>>> not a welcome change. How is one to order clustered docs by distance from 
>>>> cluster centroid?
>>>> 
>>>> I'm sure I could calculate the distance but that would mean looking up the 
>>>> centroid for the cluster id given in the above WeightedVectorWritable, 
>>>> which means iterating through all the clusters for each clustered doc. In 
>>>> my case the number of clusters could be fairly large.
>>>> 
>>>> Am I missing something?
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.1#6144)

Reply via email to