[
https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571273#action_12571273
]
Ted Dunning commented on MAHOUT-6:
----------------------------------
My own, non-portable, non-releasable matrix package uses a class structure
something like Colt, but with a simpler extension structure:
interface Matrix1D -- the basic interface including numerous convenience
functions
abstract class AbstractMatrix1D implements Matrix1D -- implementations of
generic capabilities like sum of elements and dot products
class DenseMatrix1D extends AbstractMatrix1D -- implements vector as an array
of doubles with an offset and stride. used for vectors and views of sub-vectors
and row or column views of dense matrices.
class SparseBinaryVector extends AbstractMatrix1D -- implements vector that
only "stores" 0 or 1, but only stores the 1's, but doesn't store them because
it knows what their value is. This is really a bit-vector implemented as a
closed hash table that only holds integers.
class SparseDoubleVector extends AbstractMatrix1D -- implements vector that
only stores non-zero doubles
interface Matrix2D -- the basic interface including convenience functions
abstract class AbstractMatrix2D implements Matrix1D -- a few universal
implementations of convenience functions mostly in terms of BLAS ops
abstract class AbstractSparseMatrix2D extends AbstractMatrix2D -- efficient
implementations of BLAS ops for generic sparse matrices
abstract class AbstractDenseMatrix2D extends AbstractMatrix2D -- reasonably
efficient implementations of BLAS ops for dense matrices
class DenseMatrix2D extends ADM2D -- matrix of doubles implemented using a
single 1D array with generic stride and offset. Also used to hold transposed
views and some kinds of sub-matrix views.
class DoublyIndexedSparseBinary2D -- sparse matrix whose non-zero elements are
all 1. Fast row and column views are available through redundant storage.
class SparseRowDouble2D -- sparse matrix with general element values whose rows
are accessible quickly. Implemented as an array of SparseDouble1D vectors.
class SparseColumnDouble2D -- sparse matrix with general elements values whose
columns are accessible quickly.
Functions --- Matrices support updates using functional objects so that generic
in-place updates can be done very efficiently. I stole this idea from Colt.
In fact, my matrix package uses the Colt Functions object for my own matrix
implementations which is one reason I can't distribute my own matrices very
easily.
Any comments on this general structure? For machine learning, I have had very
little call for complex numbers. Some might take issue with my assumption that
float's pretty much just don't exist, but for large problems, I find it
imperative to retain precision more than save memory.
> Need a matrix implementation
> ----------------------------
>
> Key: MAHOUT-6
> URL: https://issues.apache.org/jira/browse/MAHOUT-6
> Project: Mahout
> Issue Type: New Feature
> Reporter: Ted Dunning
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to different
> reducers
> g) a reasonable set of matrix operations should be supported, these should
> eventually include:
> simple matrix-matrix and matrix-vector and matrix-scalar linear algebra
> operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
> row and column sums
> generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u +
> beta v
> h) easy and efficient iteration constructs, especially for sparse matrices
> i) easy to extend with new implementations
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.