strictly speaking, it would be A.rdd.reduce(_._2 + _._2). oh well
On Sun, Jul 13, 2014 at 10:08 PM, Dmitriy Lyubimov <[email protected]> wrote: > the only problem with that I see is that would not be algebra any more. > that would be functional programming, and as such there are probably better > frameworks to address these kind of things than a DRM. Drm currently > suggest just to exist to engine level primitives, i.e. do something like > A.rdd.reduce(_+_). > > > On Sun, Jul 13, 2014 at 10:02 PM, Anand Avati <[email protected]> wrote: > >> How about a new drm API: >> >> >> type ReduceFunc = (Vector, Vector) => Vector >> >> def reduce(rf: ReduceFunc): Vector = { ... } >> >> The row keys in this case are ignored/erased, but I'm not sure if they are >> useful (or even meaningful) for reduction. Such an API should be >> sufficient >> for kmeans (in combination with mapBlock). But does this feel generic >> enough? Maybe a good start? Feedback welcome. >> >> >> >> On Sun, Jul 13, 2014 at 6:34 PM, Ted Dunning <[email protected]> >> wrote: >> >> > >> > Yeah. Collect was where I had gotten, and was rather sulky about the >> > results. >> > >> > It does seem like a reduce is going to be necessary. >> > >> > Anybody else have thoughts on this? >> > >> > Sent from my iPhone >> > >> > > On Jul 13, 2014, at 17:58, Anand Avati <[email protected]> wrote: >> > > >> > > collect(), hoping the result fits in memory, and do the reduction >> > in-core. >> > > I think some kind of a reduce operator needs to be introduced for >> doing >> > > even simple things like scalable kmeans. Haven't thought of how it >> would >> > > look yet. >> > >> > >
