[
https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572248#action_12572248
]
Ted Dunning commented on MAHOUT-6:
----------------------------------
Btw, the easiest Mahout user story for views is parallel multiplication (=
coocurrence counting).
To multiply A' * B, one way is to reduce on columns of A modulo some number
that is about 2-5x the number of reducers which should about the number of
cores to be used. The map would emit copies of B, one for each reduce key.
Then it should emit columns of A keyed by the column number modulo the reduce
key count.
It is highly desirable to:
a) not write a special serializer for columns of a matrix ... the serializer
for vectors should do.
b) not copy columns of A before serializer.
Column views give us what we need. By symmetry, we should have row views, of
course. Also, if you have the machinery for row and column views, sub-matrix
views are trivial additions.
Views can be done many ways. One way is with a view wrapper, and this can be
used at the abstract matrix level to get views for free for new
implementations. For many kinds of matrix, it is desirable to have a special
purpose view to avoid the wrapper overhead. Dense matrices based on strided
access to an array of values, for instance, can support views with no
additional mechanism and without any appreciable overhead other than memory
locality issues. Most sparse representations can provide either row or column
views very cheaply as well.
> Need a matrix implementation
> ----------------------------
>
> Key: MAHOUT-6
> URL: https://issues.apache.org/jira/browse/MAHOUT-6
> Project: Mahout
> Issue Type: New Feature
> Reporter: Ted Dunning
> Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff
>
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to different
> reducers
> g) a reasonable set of matrix operations should be supported, these should
> eventually include:
> simple matrix-matrix and matrix-vector and matrix-scalar linear algebra
> operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
> row and column sums
> generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u +
> beta v
> h) easy and efficient iteration constructs, especially for sparse matrices
> i) easy to extend with new implementations
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.