Ted Dunning created MAHOUT-1582:
-----------------------------------

             Summary: Create simpler row and column aggregation API at local 
level
                 Key: MAHOUT-1582
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1582
             Project: Mahout
          Issue Type: Bug
            Reporter: Ted Dunning


The issue is that the current row and column aggregation API makes it difficult 
to do anything but row by row aggregation using anonymous classes.  There is no 
scope for being aware of locality, nor to use the well known function 
definitions in Functions.  This makes lots of optimizations impossible and many 
of these are optimizations that we want to have.  An example would be adding up 
absolute values of values.  With the current API, it would be very hard to 
optimize for sparse matrices and the wrong direction of iteration but with a 
different API, this should be easy.

What I suggest is an API of this form:

{code}
   Vector aggregateRows(DoubleDoubleFunction combiner, DoubleFunction mapper)
{code}

This will produce a vector with one element per row in the original.  The nice 
thing here is that if the matrix is row major, we can iterate over rows and 
accumulate a value for each row using sparsity as available.  On the other 
hand, if the matrix is column major, we can keep a vector of accumulators and 
still use sparsity as appropriate.  

The use of sparsity comes in because the matrix code now has control over both 
of the loops involved and also has visibility into properties of the map and 
combine functions.  For instance, ABS(0) == 0 so if we combine with PLUS, we 
can use a sparse iterator.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to