Matrix A has N rows (each of which has cardinality M_A), and Matrix B has
N rows (each of which has cardinality M_B).
I suppose this is where I get confused. I thought, by definition, matrix A has dimensions (n by m), and matrix B has dimensions (m by p), and the resulting matrix is (n by p). I saw in the implementation that it cleverly uses the transpose of A such that just the row vectors are needed, but my confusion comes from the fact that I don't see an explicit transpose before the times() job gets going.

So, in a toy example, A = [3 by 2], B = [2 by 2], it looks to me as if the three rows of A are being sent to the MR job with the two rows of B, which doesn't make any sense. I know there should be a transpose of A somewhere but I don't see it.

Unless the assumption is that the user calls transpose() before calling times()? Which doesn't make any sense either since I've used this job just fine. I know I'm missing something simple...thanks for your help.

Also: I'll shelve the general DRM rewrite patch, then, for the time being. You make good points, and there are other patches I should work on in the meantime :) (though I could just experiment with 0.21 to see how well that works)

Shannon

   There are thus N pairs of
vectors {A_i, B_i}, and if you take MatrixSum_{i=1,N} (A_i^T x B_i), you get
a matrix with M_A rows, each of which has cardinality M_B, and this matrix
is exactly A^T * B.

*You take the transpose on the vectors, row at a time*, from the first of
the two matrices.

   -jake


I want to understand this little bit so I adequately replicate it in the
new patch. Thanks!

Shannon

Apologies for the brevity, this was sent from my iPhone

On Dec 29, 2010, at 1:06, "Shannon Quinn (JIRA)"<[email protected]>  wrote:

     [
https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
Shannon Quinn updated MAHOUT-537:
---------------------------------

    Attachment: MAHOUT-537.patch

Updated patch. Fixes from previous patch are included, this time merged
with unrelated changes to the related files. Also removed all the
commented-out old code, and even caught and fixed a few bugs. Fully
implemented timesSquared(). All that remains is the times(DRM) job. Will
update on this very soon.
(regarding the previous comments on this ticket: I'm using Hadoop 0.20.2)

Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
-------------------------------------------------------------

                Key: MAHOUT-537
                URL: https://issues.apache.org/jira/browse/MAHOUT-537
            Project: Mahout
         Issue Type: Improvement
   Affects Versions: 0.4
           Reporter: Shannon Quinn
           Assignee: Shannon Quinn
        Attachments: MAHOUT-537.patch, MAHOUT-537.patch


Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2
API, in particular eliminate dependence on the deprecated JobConf, using
instead the separate Job and Configuration objects.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Reply via email to