[ 
https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571966#action_12571966
 ] 

Ted Dunning commented on MAHOUT-6:
----------------------------------


Regarding the 6b diff, the intent of the view functions is to return a 
reference to the same underlying storage so that in place updates of parts of a 
matrix or vector can be done.  This is actually pretty important for 
performance in some cases and it pretty massively simplifies the API.

This means that it is much simpler for the Dense1D implementation to have a 
reference to a double[], an offset and a stride.  That means that views are 
easy.  Likewise, the Dense2D implementation should have a reference to a 
double[], an offset and a column and row stride.  This allows many views such 
as transpose, diagonals, rows and columns to allow be really simple.  It also 
allows column or row major memory layout (if we ever get to the point we care 
that much).  You can even handle banded matrices pretty well with this layout, 
although you need a teensy bit of logic to make sure that out-of-band 
references return zeros.  

The purpose, btw, of the get/set Quick methods is that if there is some sort of 
size checking (as there really has to be since array bounds checking won't 
necessary catch out of range references in views), then the quick alternatives 
can be used to avoid the checks.  This allows the range checks to be factored 
out of some inner loops with obvious benefit in any case where the compiler is 
less than genius level.

> Need a matrix implementation
> ----------------------------
>
>                 Key: MAHOUT-6
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-6
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>         Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff
>
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to different 
> reducers
> g) a reasonable set of matrix operations should be supported, these should 
> eventually include:
>     simple matrix-matrix and matrix-vector and matrix-scalar linear algebra 
> operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
>     row and column sums  
>     generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u + 
> beta v
> h) easy and efficient iteration constructs, especially for sparse matrices
> i) easy to extend with new implementations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to