[ 
https://issues.apache.org/jira/browse/MAHOUT-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201252#comment-13201252
 ] 

Gaurav Redkar commented on MAHOUT-966:
--------------------------------------

Hello,

As Paritosh suggested, i tried specifying the -cl option while clustering. But 
I am still experiencing the same problem. The number of members printed by the 
clusterdumper code match the number of points generated by the 
ClusterOutputPostProcessor for each cluster. Sadly this number does not match 
the value 'n' for that cluster in the clusterdumper implementation. 

Also while running the algorithm on a different dataset,the clustering 
algorithm resulted in two clusters with the same cluster identifier..!! Also 
that cluster contained some of the points twice. Any idea as to why is this 
happening.?  

The command used for performing the clustering job is :

bin/mahout org.apache.mahout.clustering.syntheticcontrol.meanshift.Job  -x 15  
-cd 5 -t1 100  -t2 30 -cl  -dm 
org.apache.mahout.common.distance.EuclideanDistanceMeasure -i testdata -ow -o 
output

i am attaching the dataset on which i tried the clustering. Kindly give your 
suggestions on it.

                
> Mismantch in the number of points given by the clusterDumper and 
> ClusterOutputPostProcessor
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-966
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-966
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.6
>         Environment: hadoop 0.20.2 mahout 0.6 
>            Reporter: Gaurav Redkar
>            Priority: Minor
>         Attachments: points100dCCNorm.txt
>
>
>  After running the post processor the number of points that each cluster 
> contains is not matching the number of points each cluster should contain as 
> stated by clusterdumper.
>  
> MSV-287{ n=90 c=[0.05195, 0.05675, 0.07151, 0.05713, 0.06946,...}
> MSV-145{ n=90 c=[0.93685, 0.93071, 0.93641, 0.94629, 0.94409,..}
> the n mentioned in clusters-n-final against each cluster is different from 
> the number of points actually contained in d directory for each cluster. Any 
> idea why is this happening ...?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to