We (DigitalPebble) managed the crawl for them and wrote the custom bits they required. The problems they mentioned were more related to EC2 than Hadoop as such. More on http://digitalpebble.blogspot.com/2010/09/similarpages-is-out.html
Jul On 17 November 2011 16:57, Lewis John Mcgibbney <[email protected]>wrote: > Hi, > > Some more positives here. > > Lewis > > ---------- Forwarded message ---------- > From: Pietro Borradori <[email protected]> > Date: Thu, Nov 17, 2011 at 4:46 PM > Subject: Fw: Lewis John McGibbney sent a message via SimilarPages – A web > discovery and search add-on > To: "[email protected]" <[email protected]> > Cc: Marco Laurita <[email protected]> > > > Hi Lewis, > > Thanks for your email... I'm sorry to reply you late... > Nutch is a fundamental piece of SimilarPages architecture, because of its > crawling features and for the solid base on which it is built that is > Hadoop. Hadoop allows us to make all the computations on the crawled data, > it is really a fantastic project! Hadoop gives us some headache sometimes > when we need large clusters to perform the computation on the crawled data, > especially when there are some instances whith hardware failures where > Hadoop is supposed to overcome such situations without problems. Marco > co-founder/CTO of SimilarPages is at you disposal for any deeper insight re > Nutch/Hadoop implementation if helpful. > > Here is the page of our site re Nutch/Hadoop > > http://www.similarpages.com/web/index.php?option=com_content&view=article&id=8&Itemid=20 > > We liked Nutch/hadoop projects in our 2 official FB pages: > http://www.facebook.com/pages/SimilarPagescom/303352486359786?sk=wall > > http://www.facebook.com/pages/SimilarPages-A-web-discovery-and-search-addon/149182788451193 > > A take a tour video here... > > http://www.similarpages.com/web/index.php?option=com_content&view=article&id=15&Itemid=4 > > You can follow me on twitter @MrCappuccini > > We've finally released the beta of the SimilarPages search engine!! Check > it out at www.similarpages.com and let us know what you think!! > > my best > Pietro > > Pietro Borradori > Founder & CEO > > [image: http://www.similarpages.com/images/Loghetto_posta.jpg] > > ------------------------------ > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

