> > > Accessing a separate SequenceFile from within a Mapper is *way inefficient* > (orders of magnitude slower). > > You want to do a map-side join. This is what is done in MatrixMultiplyJob > - > your Mapper gets IntWritable as key, and the value is a Pair of > VectorWritables - > one from each matrix. >
Excellent. Any idea what the Hadoop 0.20.2 equivalent for CompositeInputFormat is? :)
