[ 
https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571273#action_12571273
 ] 

Ted Dunning commented on MAHOUT-6:
----------------------------------


My own, non-portable, non-releasable matrix package uses a class structure 
something like Colt, but with a simpler extension structure:

interface Matrix1D -- the basic interface including numerous convenience 
functions

abstract class AbstractMatrix1D implements Matrix1D -- implementations of 
generic capabilities like sum of elements and dot products

class DenseMatrix1D extends AbstractMatrix1D -- implements vector as an array 
of doubles with an offset and stride. used for vectors and views of sub-vectors 
and row or column views of dense matrices.

class SparseBinaryVector extends AbstractMatrix1D -- implements vector that 
only "stores" 0 or 1, but only stores the 1's, but doesn't store them because 
it knows what their value is.  This is really a bit-vector implemented as a 
closed hash table that only holds integers.

class SparseDoubleVector extends AbstractMatrix1D -- implements vector that 
only stores non-zero doubles



interface Matrix2D -- the basic interface including convenience functions

abstract class AbstractMatrix2D implements Matrix1D -- a few universal 
implementations of convenience functions mostly in terms of BLAS ops

abstract class AbstractSparseMatrix2D extends AbstractMatrix2D -- efficient 
implementations of BLAS ops for generic sparse matrices

abstract class AbstractDenseMatrix2D extends AbstractMatrix2D -- reasonably 
efficient implementations of BLAS ops for dense matrices

class DenseMatrix2D extends ADM2D -- matrix of doubles implemented using a 
single 1D array with generic stride and offset.  Also used to hold transposed 
views and some kinds of sub-matrix views. 

class DoublyIndexedSparseBinary2D -- sparse matrix whose non-zero elements are 
all 1.  Fast row and column views are available through redundant storage.

class SparseRowDouble2D -- sparse matrix with general element values whose rows 
are accessible quickly.  Implemented as an array of SparseDouble1D vectors.

class SparseColumnDouble2D -- sparse matrix with general elements values whose 
columns are accessible quickly.


Functions --- Matrices support updates using functional objects so that generic 
in-place updates can be done very efficiently.  I stole this idea from Colt.   
In fact, my matrix package uses the Colt Functions object for my own matrix 
implementations which is one reason I can't distribute my own matrices very 
easily.

Any comments on this general structure?  For machine learning, I have had very 
little call for complex numbers.  Some might take issue with my assumption that 
float's pretty much just don't exist, but for large problems, I find it 
imperative to retain precision more than save memory.


> Need a matrix implementation
> ----------------------------
>
>                 Key: MAHOUT-6
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-6
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to different 
> reducers
> g) a reasonable set of matrix operations should be supported, these should 
> eventually include:
>     simple matrix-matrix and matrix-vector and matrix-scalar linear algebra 
> operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
>     row and column sums  
>     generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u + 
> beta v
> h) easy and efficient iteration constructs, especially for sparse matrices
> i) easy to extend with new implementations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to