Hi, Here is what I think, please correct me if I am wrong.
1. At its core, since Nutch is a web crawler, there must be a bfs going on. In local mode we would be using a simple bfs algorithm but in deploy mode we need a distributed version of it. In the current version of Nutch, this should have been implemented as a Map Reduce program. My suggestion is to implement it as a BSP program using Hama. Advantages: BSP is naturally suited model for graph algorithms. Please see [0] and [1]. IMO we should see a performance improvement with Hama. [0] http://www.slideshare.net/chodakowski/processing-graphrelational-data-with-mapreduce-and-bulk-synchronous-parallel [1] http://www.slideshare.net/udanax/apache-hama-an-introduction-tobulk-synchronization-parallel-on-hadoop-2699426 -- thanks and regards, Apurv Verma B. Tech.(CSE) IIT- Ropar On Sun, Mar 25, 2012 at 2:50 AM, Mathijs Homminga < [email protected]> wrote: > This is interesting, can you elaborate a bit more on this. In what way do > you think could Nutch benefit from an implementation in Hama? > > Mathijs Homminga > > On 24 mrt. 2012, at 13:55, Apurv Verma wrote: > > > Hi, > > Would the Nutch community be interested in integrating Nutch and Hama. > Apache Hama is a Bulk Synchronous Parallel programming model written on top > of HDFS, highly suited for graph algorithms. > > Currently Nutch supports running with Map Reduce paradigm. If the > community is interested I would like to take it up as a gsoc project. > > > > -- > > thanks and regards, > > > > Apurv Verma > > B. Tech.(CSE) > > IIT- Ropar > > > > > > > > > >

