Then, what is the main difference: (1) storing the input on the cluster shared directory, loading it in the configure stage of mappers and (2) using the distributed cache?

Shi

On 4/25/2011 8:17 AM, Kai Voigt wrote:
Hi,

I'd use the distributed cache to store the vector on every mapper machine 
locally.

Kai

Am 22.04.2011 um 21:15 schrieb Alexandra Anghelescu:

Hi all,

I am trying to perform matrix-vector multiplication using Hadoop.
So I have matrix M in a file, and vector v in another file. How can I make
it so that each Map task will get the whole vector v and a chunk of matrix
M?
Basically I want my map function to output key-value pairs (i,m[i,j]*v[j]),
where i is the row number, and j the column number. And the reduce function
will sum up all the values with the same key i, and that will be the ith
element of my result vector.
Or can you suggest another way to do it?


Thanks,
Alexandra Anghelescu

Reply via email to