Re: GSoC2012 Idea: Integrating Nutch With Hama

Apurv Verma Sun, 25 Mar 2012 00:46:09 -0700

Hi,
 Here is what I think, please correct me if I am wrong.

   1. At its core, since Nutch is a web crawler, there must be a bfs going
   on. In local mode we would be using a simple bfs algorithm but in deploy
   mode we need a distributed version of it.
   In the current version of Nutch, this should have been implemented as a
   Map Reduce program. My suggestion is to implement it as a BSP program using
   Hama.

   Advantages:
   BSP is naturally suited model for graph algorithms. Please see [0] and
   [1]. IMO we should see a performance improvement with Hama.

[0]
http://www.slideshare.net/chodakowski/processing-graphrelational-data-with-mapreduce-and-bulk-synchronous-parallel
[1]
http://www.slideshare.net/udanax/apache-hama-an-introduction-tobulk-synchronization-parallel-on-hadoop-2699426

--
thanks and regards,

Apurv Verma
B. Tech.(CSE)
IIT- Ropar

On Sun, Mar 25, 2012 at 2:50 AM, Mathijs Homminga <
[email protected]> wrote:

> This is interesting, can you elaborate a bit more on this. In what way do
> you think could Nutch benefit from an implementation in Hama?
>
> Mathijs Homminga
>
> On 24 mrt. 2012, at 13:55, Apurv Verma wrote:
>
> > Hi,
> >  Would the Nutch community be interested in integrating Nutch and Hama.
> Apache Hama is a Bulk Synchronous Parallel programming model written on top
> of HDFS, highly suited for graph algorithms.
> > Currently Nutch supports running with Map Reduce paradigm. If the
> community is interested I would like to take it up as a gsoc project.
> >
> > --
> > thanks and regards,
> >
> > Apurv Verma
> > B. Tech.(CSE)
> > IIT- Ropar
> >
> >
> >
> >
>
>

Re: GSoC2012 Idea: Integrating Nutch With Hama

Reply via email to