Yeah, +1 on the wrapper idea.

On Mar 17, 2008, at 11:35 AM, Jason Rennie wrote:

Labels are certainly valuable (esp. for text) and if they are somehow built into the matrix lib, it will make the user's life easier. I share similar concerns w/ Ted and think his idea for a LabelWrapper class is a great idea.

Jason

On Sun, Mar 16, 2008 at 5:28 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:


I have been batting that question back and forth in my own head recently.

It IS absolutely a huge help to have labels. R has the data.frame to do this and it helps enormously. I have done it in some applications and it
saved endless hassle.

On the other hand, there is a real danger that the label functionality
would
get sucked into a single implementation. Labels really are an orthogonal concern that are (should be) independent of how the matrix is implemented.

So should there really be something like a LabeledMatrix wrapper that
provides this labeling service to any matrix?


On 3/16/08 2:23 PM, "Grant Ingersoll (JIRA)" <[EMAIL PROTECTED]> wrote:


   [

https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin
.

system.issuetabpanels:comment- tabpanel&focusedCommentId=12579261#action_125792
61 ]

Grant Ingersoll commented on MAHOUT-6:
--------------------------------------

Does it make sense to be able to assign labels to the rows and columns
and
maybe even have it accessible as a map? For instance, I think I could
use
these for the bayesian classifier implementation I am working on and it
would
make sense to be able to label the features and the labels. Naturally,
I can
store the information elsewhere as well, but didn't know whether it made
sense
to keep the info w/ the matrix.

Need a matrix implementation
----------------------------

               Key: MAHOUT-6
               URL: https://issues.apache.org/jira/browse/MAHOUT-6
           Project: Mahout
        Issue Type: New Feature
          Reporter: Ted Dunning
          Assignee: Grant Ingersoll
       Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff,
MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff,
MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff, MAHOUT-6k.diff,
MAHOUT-6l.patch


We need matrices for Mahout.
An initial set of basic requirements includes:
a) sparse and dense support are required
b) row and column labels are important
c) serialization for hadoop use is required
d) reasonable floating point performance is required, but awesome FP is
not
e) the API should be simple enough to understand
f) it should be easy to carve out sub-matrices for sending to different
reducers
g) a reasonable set of matrix operations should be supported, these
should
eventually include:
   simple matrix-matrix and matrix-vector and matrix-scalar linear
algebra
operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
   row and column sums
generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A
u +
beta v
h) easy and efficient iteration constructs, especially for sparse
matrices
i) easy to extend with new implementations




--
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http://www.stylefeeder.com/
Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





Reply via email to