Yeah, +1 on the wrapper idea.
On Mar 17, 2008, at 11:35 AM, Jason Rennie wrote:
Labels are certainly valuable (esp. for text) and if they are
somehow built
into the matrix lib, it will make the user's life easier. I share
similar
concerns w/ Ted and think his idea for a LabelWrapper class is a
great idea.
Jason
On Sun, Mar 16, 2008 at 5:28 PM, Ted Dunning <[EMAIL PROTECTED]>
wrote:
I have been batting that question back and forth in my own head
recently.
It IS absolutely a huge help to have labels. R has the data.frame
to do
this and it helps enormously. I have done it in some applications
and it
saved endless hassle.
On the other hand, there is a real danger that the label
functionality
would
get sucked into a single implementation. Labels really are an
orthogonal
concern that are (should be) independent of how the matrix is
implemented.
So should there really be something like a LabeledMatrix wrapper that
provides this labeling service to any matrix?
On 3/16/08 2:23 PM, "Grant Ingersoll (JIRA)" <[EMAIL PROTECTED]> wrote:
[
https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin
.
system.issuetabpanels:comment-
tabpanel&focusedCommentId=12579261#action_125792
61 ]
Grant Ingersoll commented on MAHOUT-6:
--------------------------------------
Does it make sense to be able to assign labels to the rows and
columns
and
maybe even have it accessible as a map? For instance, I think I
could
use
these for the bayesian classifier implementation I am working on
and it
would
make sense to be able to label the features and the labels.
Naturally,
I can
store the information elsewhere as well, but didn't know whether
it made
sense
to keep the info w/ the matrix.
Need a matrix implementation
----------------------------
Key: MAHOUT-6
URL: https://issues.apache.org/jira/browse/MAHOUT-6
Project: Mahout
Issue Type: New Feature
Reporter: Ted Dunning
Assignee: Grant Ingersoll
Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff,
MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff,
MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff, MAHOUT-6k.diff,
MAHOUT-6l.patch
We need matrices for Mahout.
An initial set of basic requirements includes:
a) sparse and dense support are required
b) row and column labels are important
c) serialization for hadoop use is required
d) reasonable floating point performance is required, but awesome
FP is
not
e) the API should be simple enough to understand
f) it should be easy to carve out sub-matrices for sending to
different
reducers
g) a reasonable set of matrix operations should be supported, these
should
eventually include:
simple matrix-matrix and matrix-vector and matrix-scalar linear
algebra
operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
row and column sums
generalized level 2 and 3 BLAS primitives, alpha A B + beta C
and A
u +
beta v
h) easy and efficient iteration constructs, especially for sparse
matrices
i) easy to extend with new implementations
--
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http://www.stylefeeder.com/
Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/
--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ