Re: M/R over two matrices, and computing the median

Shannon Quinn Mon, 02 Aug 2010 15:13:39 -0700

>
>
> Accessing a separate SequenceFile from within a Mapper is *way inefficient*
> (orders of magnitude slower).
>
> You want to do a map-side join.  This is what is done in MatrixMultiplyJob
> -
> your Mapper gets IntWritable as key, and the value is a Pair of
> VectorWritables -
> one from each matrix.
>


Excellent. Any idea what the Hadoop 0.20.2 equivalent for
CompositeInputFormat is? :)

Re: M/R over two matrices, and computing the median

Reply via email to