Re: Profiling SequentialAccessSparseVector

Jake Mannix Thu, 18 Feb 2010 16:44:21 -0800

On Thu, Feb 18, 2010 at 3:58 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:


> Actually, this makes the case that we should have something like:
>
>     microMapReduce(aggregatorFunction, aggregatorUnit, binaryMapFunction,
> vectorA, vectorB)
>

What would this method mean?  aggregatorUnit means what?  What would this
be a method on?

The reason why we need a specialized function is to do things in a nicely
mutating way: Hadoop M/R is functional in the lispy-sensen: read-only
immutable
objects (once on the filesystem).

The only thing more we need than what we have now is in the assign method -
currently we have it with a map, with reduce being the identity (with
replacement -
the calling object becomes the output of the reduce -ie the output of the
map):

  Vector.assign(Vector other, BinaryFunction map) {
    // implemented effectively as follows in AbstractVector
    for(int i=0;i<size();i++)
      setQuick(i, map.apply(getQuick(i), other.getQuick(i));
    return this;
  }

Something more powerful (and sparse-efficient) would be:

  Vector.assign(Vector v, BinaryFunction map,  BinaryFunction reduce,
boolean s) {
     Iterator<Element> it = sparse ? other.iterateNonZero() :
other.iterateAll();
     while(it.hasNext()) {
       Element e = it.next();
       int i = e.index();
       e.set(i, map.apply(getQuick(i), e.get()));
     }
     // do stuff with the reduce - what exactly?
     return this;
  }

(is the reduce necessary?)


  -jake

Re: Profiling SequentialAccessSparseVector

Reply via email to