And the key in each map() would correspond to the row in whichever SequenceFile it's parsing, so as long the two files line up their keys, I'll have exactly two VectorWritables (or whatever Writable) per key in the Reducer.
Oy. That's about as simple as it gets. Thank you very much!! Shannon On Tue, Aug 3, 2010 at 1:17 PM, Sean Owen <[email protected]> wrote: > You want row N from matrix A and B? > > Map A to (row # -> row vector) and likewise for B. Both are input paths. > Then the reducer has, for each row, both row vectors. > > You can add a custom Writable with more info about, say, which vector > is which if you like. > > On Tue, Aug 3, 2010 at 10:12 AM, Shannon Quinn <[email protected]> wrote: > > Right, that's the concept I'd had in mind, but to me it always seem to > come > > down to having access to two distinct vectors at the same time, and I'm > not > > sure how you would do that. In my case, both the dimensions and the data > > types of the two vectors are identical, so we're talking a merged vector > of > > floats that's simply twice as long as the original, but how to gain > access > > to the two original vectors at the same time is beyond me. > > > > But still, the data types I need that would do this for me are in a newer > > Hadoop commit, I'm just trying to figure out how to build the commit > > manually and integrate it to the core Hadoop .jar file. > > > > Any suggestions that would speed along either of these options are most > > welcome. > > > > Shannon >
