Matrix A has N rows (each of which has cardinality M_A), and Matrix B has
N rows (each of which has cardinality M_B).
I suppose this is where I get confused. I thought, by definition, matrix
A has dimensions (n by m), and matrix B has dimensions (m by p), and the
resulting matrix is (n by p). I saw in the implementation that it
cleverly uses the transpose of A such that just the row vectors are
needed, but my confusion comes from the fact that I don't see an
explicit transpose before the times() job gets going.
So, in a toy example, A = [3 by 2], B = [2 by 2], it looks to me as if
the three rows of A are being sent to the MR job with the two rows of B,
which doesn't make any sense. I know there should be a transpose of A
somewhere but I don't see it.
Unless the assumption is that the user calls transpose() before calling
times()? Which doesn't make any sense either since I've used this job
just fine. I know I'm missing something simple...thanks for your help.
Also: I'll shelve the general DRM rewrite patch, then, for the time
being. You make good points, and there are other patches I should work
on in the meantime :) (though I could just experiment with 0.21 to see
how well that works)
Shannon
There are thus N pairs of
vectors {A_i, B_i}, and if you take MatrixSum_{i=1,N} (A_i^T x B_i), you get
a matrix with M_A rows, each of which has cardinality M_B, and this matrix
is exactly A^T * B.
*You take the transpose on the vectors, row at a time*, from the first of
the two matrices.
-jake
I want to understand this little bit so I adequately replicate it in the
new patch. Thanks!
Shannon
Apologies for the brevity, this was sent from my iPhone
On Dec 29, 2010, at 1:06, "Shannon Quinn (JIRA)"<[email protected]> wrote:
[
https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
Shannon Quinn updated MAHOUT-537:
---------------------------------
Attachment: MAHOUT-537.patch
Updated patch. Fixes from previous patch are included, this time merged
with unrelated changes to the related files. Also removed all the
commented-out old code, and even caught and fixed a few bugs. Fully
implemented timesSquared(). All that remains is the times(DRM) job. Will
update on this very soon.
(regarding the previous comments on this ticket: I'm using Hadoop 0.20.2)
Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
-------------------------------------------------------------
Key: MAHOUT-537
URL: https://issues.apache.org/jira/browse/MAHOUT-537
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.4
Reporter: Shannon Quinn
Assignee: Shannon Quinn
Attachments: MAHOUT-537.patch, MAHOUT-537.patch
Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2
API, in particular eliminate dependence on the deprecated JobConf, using
instead the separate Job and Configuration objects.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.