[
https://issues.apache.org/jira/browse/MAHOUT-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214537#comment-13214537
]
Gaurav Redkar commented on MAHOUT-966:
--------------------------------------
I modified the clusterdumper and meanshift clustering source codes in order to
make the clusterdumper output the number of boundPoints(size of the
"boundPoints" list basically) along with the numPoints, radius and center for
each cluster.
When i ran the clustering job on synthetic_control.data using the following
parameters:
bin/mahout org.apache.mahout.clustering.syntheticcontrol.meanshift.Job -x 25
-cd 5 -t1 50 -t2 10 -dm
org.apache.mahout.common.distance.EuclideanDistanceMeasure -i testdata -ow -o
output -cl
some of the clusters had different values for the variable "numPoints" and size
of "boundPoints".
What i want to know is what is the difference between "numPoints" and the
"boundPoints" and shouldnt the size of "boundPoints" list be the same as
"numPoints"..?
Also in referring to this thread, the number of members printed by each cluster
matched the number of boundPoints for that cluster.
Any suggestions..?
> Mismantch in the number of points given by the clusterDumper and
> ClusterOutputPostProcessor
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-966
> URL: https://issues.apache.org/jira/browse/MAHOUT-966
> Project: Mahout
> Issue Type: Bug
> Components: Integration
> Affects Versions: 0.6
> Environment: hadoop 0.20.2 mahout 0.6
> Reporter: Gaurav Redkar
> Priority: Minor
> Attachments: cluster-dumper-output.txt, clusterpp-output.txt,
> mtestdata.txt, points100dCCNorm.txt
>
>
> After running the post processor the number of points that each cluster
> contains is not matching the number of points each cluster should contain as
> stated by clusterdumper.
>
> MSV-287{ n=90 c=[0.05195, 0.05675, 0.07151, 0.05713, 0.06946,...}
> MSV-145{ n=90 c=[0.93685, 0.93071, 0.93641, 0.94629, 0.94409,..}
> the n mentioned in clusters-n-final against each cluster is different from
> the number of points actually contained in d directory for each cluster. Any
> idea why is this happening ...?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira