On Mon, Nov 5, 2012 at 4:44 AM, Dan Filimon <[email protected]>wrote:
> > Ted told me that Mahout Centroids [1] are Weighted vectors that > additionally perform a Welford-style update of a vector. > I think that there may be an older Centroid definition that is different from this. > So, in the code, for an existing Centroid c, with weight w_c, updating it > with a new Vector v whose weight is w_v, the result of an "update" is: > > (w_c * c[i] + w_v * v[i]) / (w_c + w_v), for all elements (i is the index) > Correct. > Since weights actually mean the number of elements in a certain cluster, > merging two clusters is exactly the operation described above. > > Why is this called a Welford update? > See here http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm > Also, why is the original Vector's function named assign? This is just one of several assign functions. The name assign comes from the original name used in the Colt library and is intended to indicate that it is a destructive operation rather than a copying operation like times(). It's really an implementation of the higher order function zipwith [2]. > I don't know that zipwith is a more common name. Haskell has historically had a very small community.
