pi song
Fri, 13 Jun 2008 04:04:20 -0700
To my knowledge, Map Reduce was designed primarily for problems that can be partitioned in 1 dimension. Some problems, e.g. matrix multiplication, may require higher dimension. I would say the real constraint is the processing model. Seems like the right way is to look at sparse matrices for real world applications. Thanks for thoughtful opinion! Pi On Thu, Jun 12, 2008 at 3:59 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > The more that I look at problems that include matrix multiplication as a > core element, the more that I think that large tuples are a bad way to > encode these matrices. > > I am assuming very sparse matrices here which is definitely the norm in > what I do. Very large dense matrices would probably require different > approaches. > > To make what I suggest, here is an example. Suppose that we have two > sparse matrices A and B and we want to A' A and A' B. This is useful in > coocurrence analysis, for instance. > > One possible encoding for a matrix is to have a variable sized tuple that > starts with a row number and contains a sequence of column number, value > pairs. Another encoding which appears less efficient at first glance is to > have triples with row number, column number and value. > > Matrix multiplication of the second form can be written (roughly) as the > result of co-grouping A with A on rows or A on columns with B on rows and > emitting the product of corresponding elements. These output elements are > then grouped on index pairs and summed. In the A'A case, you can probably > do with one less group statement. > > This seems really inefficient compared to passing entire rows around > because the number of records is so much larger. But it turns out that it > is really pretty fast. A comparable problem was worked on by Chris Dyer at > University of Maryland who go almost perfect speedup relative to a > hand-coded uni-processor C program by using this kind of approach. > > Most importantly, this approach completely side-steps the problem of large > tuples. > > On Tue, Jun 10, 2008 at 8:42 PM, pi song <[EMAIL PROTECTED]> wrote: > >> ... basically). I was thinking about matrix processing in Pig before but >> it >> seemed to be difficult due to this fact. >> >>