+1 for attempting this, but beware: DistributedRowMatrix uses map-side joins, and I'm not sure those are supported in the 0.20+ API. In fact, I have specifically ran into problems because of this when I tried it in the past.
Now, some methods can just well, get slower by doing two-pass approaches (reduce-side join plus a second pass) to one-pass solveable problems, but a second pass over the data is a pretty bitter pill to swallow. Finding a way to do a map-side join in 0.20 would be nicer, if possible. -jake On Sat, Sep 4, 2010 at 8:02 AM, Jeff Eastman <[email protected]>wrote: > +1 A user mandate, a motivated developer, perfect. You have my support > Shannon, let me know if you run into problems. > > > On 9/3/10 12:17 PM, Shannon Quinn wrote: > >> Apologies for missing this; I was actually very interested in doing the >> DRM porting to 20.2, considering how much my GSoC project relies on it. >> >> Unless someone has already volunteered...in which case I'd love to help :) >> >> Shannon >> >> Apologies for the brevity, this was sent from my iPhone >> >> On Sep 3, 2010, at 15:11, Sebastian Schelter<[email protected]> wrote: >> >> I'd like to see it ported, so RowSimilarityJob can become a method of >>> DistributedRowMatrix. >>> >>> Am 03.09.2010 20:48, schrieb Jeff Eastman: >>> >>>> Is anybody working on this? Has anybody else looked at it? It seems >>>> to have a few unported dependencies like some of the classifiers. >>>> >>> >
