On Fri, Dec 4, 2009 at 12:33 AM, Sean Owen <[email protected]> wrote: > > (Though I'm curious how this works -- looks deceptively easy, this > outer product approach. Isn't v cross v potentially huge? or likely to > be sparse enough to not matter) >
v cross v is very huge, but very sparse... *but* sum_i (v_i cross v_i) is indeed going to get pretty dense, so the reducer may totally blow up if care is not taken - because it's all being done in memory in the reducer (ie yes, all of A'A lives in memory just before the reducer completes, in my naive impl). > I understand the final step in principle, which is to compute (A'A)h. > But I keep guessing A'A is too big to fit in memory? So I can > side-load the rows of A'A one at a time and compute it rather > manually. > How would you do this? Take the rows of v_i cross v_i and add them up? Isn't that another MR job? -jake
