I didn't know about BFR at the time and I always tend to choose simplicity in any case.
The theoretical bounds for streaming k-means are also persuasive. The other strong-ish candidate is k-means++, but it doesn't have the required sketch architecture in the form that they have analyzed. BFR is a reasonable candidate for follow-on work, but we should drive to conclusion with the current algorithm first. On Mon, Dec 3, 2012 at 6:47 PM, Dan Filimon <[email protected]>wrote: > My question is... why did we pick streaming k-means in particular as > opposed to this algorithm. BFR seems like a decent candidate for the > mapper clustering and while it looks more complex (algorithmically) I > wonder how the clustering quality compares to streaming k-means? > > What are your thoughts on this? >
