[ 
https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Eastman updated MAHOUT-6:
------------------------------

    Attachment: MAHOUT-6b.diff

Here's a minimal implementation of Matrix1D methods and tests that will be 
adequate to replace Point and Float[] in the clustering code. I did not 
implement all of Ted's suggestions as I did not understand them and/or there 
were missing classes. There are a couple of issues it raises:

- I made all the methods side-effect free, returning new instances from all 
operations. This will exercise the garbage collector but eliminate difficult 
debugging problems. It also seems consistent with the functional programming 
roots of map/reduce.
- I implemented no checking of cardinality sameness or division by zero so 
these are needed. The question in my mind is whether to use checked exceptions 
or runtime exceptions. There really is no valid use case I can think of for the 
former but I await comments before acting.
- I added divide(), normalize(), asFormatString() and a decodeFormat static 
which are needed by clustering

I will wait until MAHOOT-5 gets committed before beginning the refactoring 
since it will be a major change from the latest working patch and it really 
needs this stuff in trunk too before beginning.

> Need a matrix implementation
> ----------------------------
>
>                 Key: MAHOUT-6
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-6
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>         Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff
>
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to different 
> reducers
> g) a reasonable set of matrix operations should be supported, these should 
> eventually include:
>     simple matrix-matrix and matrix-vector and matrix-scalar linear algebra 
> operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
>     row and column sums  
>     generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u + 
> beta v
> h) easy and efficient iteration constructs, especially for sparse matrices
> i) easy to extend with new implementations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to