Hi Shannon, sorry to have been absent too much in this thread!
On Thu, Dec 30, 2010 at 2:16 PM, Shannon Quinn <[email protected]> wrote:
> I'm just about finished with this patch (though I'm road tripping at the
> moment), but I wanted to seek some clarification on the mechanics behind
> DRM's matrix multiplication.
>
> I see upon closer inspection that what is actually used is the transpose of
> the multiplicand (matrix A^T in A*B), thereby using only matrix rows (how
> DRMs are organized across HDFS). However, I didn't see any explicit
> transpose operation within the times() method. How is this carried out?
>
The transpose operation is a side effect of the fact that a DRM just
consists of a list of vectors, and you could view it as a row-based matrix,
or a column based matrix. The matrix multiplication like so:
Matrix A has N rows (each of which has cardinality M_A), and Matrix B has
N rows (each of which has cardinality M_B). There are thus N pairs of
vectors {A_i, B_i}, and if you take MatrixSum_{i=1,N} (A_i^T x B_i), you get
a matrix with M_A rows, each of which has cardinality M_B, and this matrix
is exactly A^T * B.
*You take the transpose on the vectors, row at a time*, from the first of
the two matrices.
-jake
> I want to understand this little bit so I adequately replicate it in the
> new patch. Thanks!
>
> Shannon
>
> Apologies for the brevity, this was sent from my iPhone
>
> On Dec 29, 2010, at 1:06, "Shannon Quinn (JIRA)" <[email protected]> wrote:
>
> >
> > [
> https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
> >
> > Shannon Quinn updated MAHOUT-537:
> > ---------------------------------
> >
> > Attachment: MAHOUT-537.patch
> >
> > Updated patch. Fixes from previous patch are included, this time merged
> with unrelated changes to the related files. Also removed all the
> commented-out old code, and even caught and fixed a few bugs. Fully
> implemented timesSquared(). All that remains is the times(DRM) job. Will
> update on this very soon.
> >
> > (regarding the previous comments on this ticket: I'm using Hadoop 0.20.2)
> >
> >> Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
> >> -------------------------------------------------------------
> >>
> >> Key: MAHOUT-537
> >> URL: https://issues.apache.org/jira/browse/MAHOUT-537
> >> Project: Mahout
> >> Issue Type: Improvement
> >> Affects Versions: 0.4
> >> Reporter: Shannon Quinn
> >> Assignee: Shannon Quinn
> >> Attachments: MAHOUT-537.patch, MAHOUT-537.patch
> >>
> >>
> >> Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2
> API, in particular eliminate dependence on the deprecated JobConf, using
> instead the separate Job and Configuration objects.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
>