[
https://issues.apache.org/jira/browse/MAHOUT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978393#comment-13978393
]
Suneel Marthi commented on MAHOUT-1468:
---------------------------------------
Maxim, what u say makes sense? Not sure if u have tried running Streaming
KMeans on any large datasets, but from my experience I had seen that the
clusterInternal() method was one of the big choke points. There's still some
lint in this implementation that needs cleaning up, see Mahout-1469 for
details.
Regarding the question about the kind'a DistanceMeasures that can be fed into
this algorithm, see the comments in M-1469.
> Creating a new page for StreamingKMeans documentation on mahout website
> -----------------------------------------------------------------------
>
> Key: MAHOUT-1468
> URL: https://issues.apache.org/jira/browse/MAHOUT-1468
> Project: Mahout
> Issue Type: Documentation
> Components: Documentation
> Affects Versions: 1.0
> Reporter: Pavan Kumar N
> Assignee: Andrew Musselman
> Labels: Documentation
> Fix For: 1.0
>
> Attachments: StreamingKMeans.txt
>
>
> Separate page required on Streaming K Means algorithm description and
> overview, explaining the various parameters can be used in streamingkmeans,
> strategy for parallelization, link to this paper:
> http://papers.nips.cc/paper/3812-streaming-k-means-approximation.pdf
--
This message was sent by Atlassian JIRA
(v6.2#6252)