On Thu, Feb 18, 2010 at 3:58 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Actually, this makes the case that we should have something like: > > microMapReduce(aggregatorFunction, aggregatorUnit, binaryMapFunction, > vectorA, vectorB) > What would this method mean? aggregatorUnit means what? What would this be a method on? The reason why we need a specialized function is to do things in a nicely mutating way: Hadoop M/R is functional in the lispy-sensen: read-only immutable objects (once on the filesystem). The only thing more we need than what we have now is in the assign method - currently we have it with a map, with reduce being the identity (with replacement - the calling object becomes the output of the reduce -ie the output of the map): Vector.assign(Vector other, BinaryFunction map) { // implemented effectively as follows in AbstractVector for(int i=0;i<size();i++) setQuick(i, map.apply(getQuick(i), other.getQuick(i)); return this; } Something more powerful (and sparse-efficient) would be: Vector.assign(Vector v, BinaryFunction map, BinaryFunction reduce, boolean s) { Iterator<Element> it = sparse ? other.iterateNonZero() : other.iterateAll(); while(it.hasNext()) { Element e = it.next(); int i = e.index(); e.set(i, map.apply(getQuick(i), e.get())); } // do stuff with the reduce - what exactly? return this; } (is the reduce necessary?) -jake