Jeff Eastman created MAHOUT-1030:
------------------------------------

             Summary: Regression: Clustered Points Should be 
WeightedPropertyVectorWritable not WeightedVectorWritable
                 Key: MAHOUT-1030
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1030
             Project: Mahout
          Issue Type: Bug
          Components: Clustering, Integration
    Affects Versions: 0.7
            Reporter: Jeff Eastman


Looks like this won't make it into this build. Pretty widespread impact on code 
and tests and I don't know which properties were implemented in the old 
version. I will create a JIRA and post my interim results.

On 6/8/12 12:21 PM, Jeff Eastman wrote:
> That's a reversion that evidently got in when the new 
> ClusterClassificationDriver was introduced. It should be a pretty easy fix 
> and I will see if I can make the change before Paritosh cuts the release bits 
> tonight.
>
> On 6/7/12 1:00 PM, Pat Ferrel wrote:
>> It appears that in kmeans the clusteredPoints are now written as 
>> WeightedVectorWritable where in mahout 0.6 they were 
>> WeightedPropertyVectorWritable? This means that the distance from the 
>> centroid is no longer stored here? Why? I hope I'm wrong because that is not 
>> a welcome change. How is one to order clustered docs by distance from 
>> cluster centroid?
>>
>> I'm sure I could calculate the distance but that would mean looking up the 
>> centroid for the cluster id given in the above WeightedVectorWritable, which 
>> means iterating through all the clusters for each clustered doc. In my case 
>> the number of clusters could be fairly large.
>>
>> Am I missing something?
>>
>>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to