The Distributed Row Matrix should be ideal for this.  When you run mappers
against this data structure, each mapper gets a different row.  You can use
assign to compute your function on each element of a row in the mapper.
 Define number of reducers = 0 and you are set.

Are you sure that you don't need some kind of reduction function, however?

You might also look at the k-means clustering which probably is related to
what you are doing in some sense.

On Sun, May 30, 2010 at 3:24 PM, Sisir Koppaka <[email protected]>wrote:

> I think I need the sort of operation Jake described above  -
> wherein I can call a function f on a vector of the whole matrix(the dataset
> here, which is sparse) in a distributed fashion) I'll see this in detail
> tomorrow. But any other pointers on this issue with reference to the
> MAHOUT-375.diff update are very welcome.
>

Reply via email to