[
https://issues.apache.org/jira/browse/MAHOUT-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973196#comment-14973196
]
Suneel Marthi commented on MAHOUT-1582:
---------------------------------------
Is this still required, post Mahout - 0.10.x ?
> Create simpler row and column aggregation API at local level
> ------------------------------------------------------------
>
> Key: MAHOUT-1582
> URL: https://issues.apache.org/jira/browse/MAHOUT-1582
> Project: Mahout
> Issue Type: Bug
> Reporter: Ted Dunning
> Assignee: Suneel Marthi
> Labels: legacy, math, scala
>
> The issue is that the current row and column aggregation API makes it
> difficult to do anything but row by row aggregation using anonymous classes.
> There is no scope for being aware of locality, nor to use the well known
> function definitions in Functions. This makes lots of optimizations
> impossible and many of these are optimizations that we want to have. An
> example would be adding up absolute values of values. With the current API,
> it would be very hard to optimize for sparse matrices and the wrong direction
> of iteration but with a different API, this should be easy.
> What I suggest is an API of this form:
> {code}
> Vector aggregateRows(DoubleDoubleFunction combiner, DoubleFunction mapper)
> {code}
> This will produce a vector with one element per row in the original. The
> nice thing here is that if the matrix is row major, we can iterate over rows
> and accumulate a value for each row using sparsity as available. On the
> other hand, if the matrix is column major, we can keep a vector of
> accumulators and still use sparsity as appropriate.
> The use of sparsity comes in because the matrix code now has control over
> both of the loops involved and also has visibility into properties of the map
> and combine functions. For instance, ABS(0) == 0 so if we combine with PLUS,
> we can use a sparse iterator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)