[
https://issues.apache.org/jira/browse/MAHOUT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978473#comment-13978473
]
Suneel Marthi commented on MAHOUT-1468:
---------------------------------------
This implementation seems to run 'ok' on Reuters dataset (its slow though
compared to other clustering algos on the same dataset), but I have never have
had success with running this on real world datasets (and so did others if u
look at the email distros).
Its about the distanceCutOff update I was alluring to and what u say makes
sense that the cutoff needs to be updated.
> Creating a new page for StreamingKMeans documentation on mahout website
> -----------------------------------------------------------------------
>
> Key: MAHOUT-1468
> URL: https://issues.apache.org/jira/browse/MAHOUT-1468
> Project: Mahout
> Issue Type: Documentation
> Components: Documentation
> Affects Versions: 1.0
> Reporter: Pavan Kumar N
> Assignee: Andrew Musselman
> Labels: Documentation
> Fix For: 1.0
>
> Attachments: StreamingKMeans.txt
>
>
> Separate page required on Streaming K Means algorithm description and
> overview, explaining the various parameters can be used in streamingkmeans,
> strategy for parallelization, link to this paper:
> http://papers.nips.cc/paper/3812-streaming-k-means-approximation.pdf
--
This message was sent by Atlassian JIRA
(v6.2#6252)