The Distributed Row Matrix should be ideal for this. When you run mappers against this data structure, each mapper gets a different row. You can use assign to compute your function on each element of a row in the mapper. Define number of reducers = 0 and you are set.
Are you sure that you don't need some kind of reduction function, however? You might also look at the k-means clustering which probably is related to what you are doing in some sense. On Sun, May 30, 2010 at 3:24 PM, Sisir Koppaka <[email protected]>wrote: > I think I need the sort of operation Jake described above - > wherein I can call a function f on a vector of the whole matrix(the dataset > here, which is sparse) in a distributed fashion) I'll see this in detail > tomorrow. But any other pointers on this issue with reference to the > MAHOUT-375.diff update are very welcome. >
