On Tue, Apr 26, 2011 at 10:33 AM, Jake Mannix <[email protected]> wrote:
> But it doesn't run as a Hadoop job? It's embarassingly parallel, right, > and the hashes could be IntWritable or LongWritable, seems to pretty > naturally fit in Mahout in this way. > It depends on the use, I think. The basic program should fit either way, but having a very large in-memory structure for the search makes it fit map-reduce a little less well. Good use of mmap here might make it fit well. > > Thus, it's natural shape seems to me to be a web service, not a > > map-reduce thing. Are we interested as a project? Does this > > description make any sense? > > > > Maybe your impl seems more of a web service, but I've naturally run > it more as a big batch operation: single pass over the data, send > everybody to their hash buckets in the mapper, and then use the > reducer just to sort by bucket while tagging. Optionally build > bloom filters in the reducer too, to compress your clusters. > I think that this is a good fit as well.
