How about a new drm API:
type ReduceFunc = (Vector, Vector) => Vector def reduce(rf: ReduceFunc): Vector = { ... } The row keys in this case are ignored/erased, but I'm not sure if they are useful (or even meaningful) for reduction. Such an API should be sufficient for kmeans (in combination with mapBlock). But does this feel generic enough? Maybe a good start? Feedback welcome. On Sun, Jul 13, 2014 at 6:34 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > > Yeah. Collect was where I had gotten, and was rather sulky about the > results. > > It does seem like a reduce is going to be necessary. > > Anybody else have thoughts on this? > > Sent from my iPhone > > > On Jul 13, 2014, at 17:58, Anand Avati <av...@gluster.org> wrote: > > > > collect(), hoping the result fits in memory, and do the reduction > in-core. > > I think some kind of a reduce operator needs to be introduced for doing > > even simple things like scalable kmeans. Haven't thought of how it would > > look yet. >