How about a new drm API:

  type ReduceFunc = (Vector, Vector) => Vector

  def reduce(rf: ReduceFunc): Vector = { ... }

The row keys in this case are ignored/erased, but I'm not sure if they are
useful (or even meaningful) for reduction. Such an API should be sufficient
for kmeans (in combination with mapBlock). But does this feel generic
enough? Maybe a good start? Feedback welcome.



On Sun, Jul 13, 2014 at 6:34 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

>
> Yeah.  Collect was where I had gotten, and was rather sulky about the
> results.
>
> It does seem like a reduce is going to be necessary.
>
> Anybody else have thoughts on this?
>
> Sent from my iPhone
>
> > On Jul 13, 2014, at 17:58, Anand Avati <av...@gluster.org> wrote:
> >
> > collect(), hoping the result fits in memory, and do the reduction
> in-core.
> > I think some kind of a reduce operator needs to be introduced for doing
> > even simple things like scalable kmeans. Haven't thought of how it would
> > look yet.
>

Reply via email to