On Thu, Mar 24, 2011 at 3:34 PM, Dhruv Kumar <[email protected]> wrote:
> 2. Another very interesting possibility is to express the BW as a recursive > join. There's a very interesting offshoot of Hadoop, called Haloop ( > http://code.google.com/p/haloop/) which supports loop control, and caching > of the intermediate results on the mapper inputs, reducer inputs and > reducer outputs to improve performance. The paper [1] describes this in > more > detail. They have implemented k-means as a recursive join. > Until there is flexibility around execution model such as the recent map-reduce 2.0 announcement from Yahoo and until that flexibility is pretty much standard, it is hard to justify this. The exception is where such extended capabilities fit into standard hadoop 0.20 environments. > In either case, I want to clearly define the scope and task list. BW will > be > the core of the project but: > > 1. Does it make sense for implementing the "counting method" for model > discovery as well? It is clearly inferior but will it be a good reference > for comparison to the BW. Any added benefit? > No opinion here except that increased scope decreases probability of even partial success. > 2. What has been the standard in the past GSoC Mahout projects regarding > unit testing and documentation? > Do it. Seriously. We use junit 4+ and very much prefer strong unit tests. Nothing in what you are proposing should require anything interesting in this regard. Testing the mapper, combiner and reducer in isolation is good. Testing the integrated program in local mode or pseudo distributed mode should suffice beyond that. It is best if you can separate command line argument parsing from execution path to that you can test them separately. > > In the meantime, I've been understanding more about Mahout, Map Reduce and > Hadoop's internals. One of my course projects this semester is to implement > the Bellman Iteration algorithm on Map Reduce and so far it has been coming > along well. > > Any feedback is much appreciated. > > Dhruv >
