On Thu, Mar 24, 2011 at 3:34 PM, Dhruv Kumar <[email protected]> wrote:

> 2. Another very interesting possibility is to express the BW as a recursive
> join.  There's a very interesting offshoot of Hadoop, called Haloop (
> http://code.google.com/p/haloop/) which supports loop control, and caching
> of the intermediate results on the mapper inputs,  reducer inputs and
> reducer outputs to improve performance. The paper [1] describes this in
> more
> detail. They have implemented k-means as a recursive join.
>

Until there is flexibility around execution model such as the recent
map-reduce 2.0 announcement
from Yahoo and until that flexibility is pretty much standard, it is hard to
justify this.

The exception is where such extended capabilities fit into standard hadoop
0.20 environments.


> In either case, I want to clearly define the scope and task list. BW will
> be
> the core of the project but:
>
> 1. Does it make sense for implementing the "counting method" for model
> discovery as well? It is clearly inferior but will it be a good reference
> for comparison to the BW. Any added benefit?
>

No opinion here except that increased scope decreases probability of even
partial success.


> 2. What has been the standard in the past GSoC Mahout projects regarding
> unit testing and documentation?
>

Do it.

Seriously.

We use junit 4+ and very much prefer strong unit tests.  Nothing in what you
are proposing should
require anything interesting in this regard.  Testing the mapper, combiner
and reducer in isolation is
good.  Testing the integrated program in local mode or pseudo distributed
mode should suffice beyond
that.  It is best if you can separate command line argument parsing from
execution path to that you
can test them separately.

>
> In the meantime, I've been understanding more about Mahout, Map Reduce and
> Hadoop's internals. One of my course projects this semester is to implement
> the Bellman Iteration algorithm on Map Reduce and so far it has been coming
> along well.
>
> Any feedback is much appreciated.
>
> Dhruv
>

Reply via email to