[
https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571883#action_12571883
]
Paul Elschot commented on MAHOUT-6:
-----------------------------------
I had a quick look at the latest patch, the 6b.diff.
It looks good, but I see one possible issue in there: the use of java
interfaces.
The problem with interfaces is that once they are implemented somewhere outside
of a code base, there is no way to change them inside that code base without
breaking something outside. In other words, public interfaces are forever, and
it may be a bit too soon for that.
So the question is whether it would be better to use public abstract classes
instead of public interfaces.
Also I'd like to have a sparse 2D matrix implementation on top of a Lucene
index with term vectors, but that appears to be no problem, and it's better
handled as another issue.
It's related to labeling rows and columns though. Lucene docs could be labeled
by a primary key value, and lucene features could be labeled by their term
values, possibly combined with the term field. (Roughly, in Lucene, a document
consists of several fields, each field having indexed terms. A term vector in
Lucene consists of all the term values and frequencies for a field of a
document.)
> Need a matrix implementation
> ----------------------------
>
> Key: MAHOUT-6
> URL: https://issues.apache.org/jira/browse/MAHOUT-6
> Project: Mahout
> Issue Type: New Feature
> Reporter: Ted Dunning
> Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff
>
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to different
> reducers
> g) a reasonable set of matrix operations should be supported, these should
> eventually include:
> simple matrix-matrix and matrix-vector and matrix-scalar linear algebra
> operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
> row and column sums
> generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u +
> beta v
> h) easy and efficient iteration constructs, especially for sparse matrices
> i) easy to extend with new implementations
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.