Excellent Julian. Excuse me for not picking this up from your blog. I took your comment a few weeks ago regarding 'large crawls' a bit too light hearted ;0)
This puts a big smile on my face. Ta for now Lewis On Thu, Nov 17, 2011 at 5:39 PM, Julien Nioche < [email protected]> wrote: > We (DigitalPebble) managed the crawl for them and wrote the custom bits > they required. The problems they mentioned were more related to EC2 than > Hadoop as such. More on > http://digitalpebble.blogspot.com/2010/09/similarpages-is-out.html > > Jul > > On 17 November 2011 16:57, Lewis John Mcgibbney <[email protected] > > wrote: > >> Hi, >> >> Some more positives here. >> >> Lewis >> >> ---------- Forwarded message ---------- >> From: Pietro Borradori <[email protected]> >> Date: Thu, Nov 17, 2011 at 4:46 PM >> Subject: Fw: Lewis John McGibbney sent a message via SimilarPages – A web >> discovery and search add-on >> To: "[email protected]" <[email protected]> >> Cc: Marco Laurita <[email protected]> >> >> >> Hi Lewis, >> >> Thanks for your email... I'm sorry to reply you late... >> Nutch is a fundamental piece of SimilarPages architecture, because of its >> crawling features and for the solid base on which it is built that is >> Hadoop. Hadoop allows us to make all the computations on the crawled data, >> it is really a fantastic project! Hadoop gives us some headache sometimes >> when we need large clusters to perform the computation on the crawled data, >> especially when there are some instances whith hardware failures where >> Hadoop is supposed to overcome such situations without problems. Marco >> co-founder/CTO of SimilarPages is at you disposal for any deeper insight re >> Nutch/Hadoop implementation if helpful. >> >> Here is the page of our site re Nutch/Hadoop >> >> http://www.similarpages.com/web/index.php?option=com_content&view=article&id=8&Itemid=20 >> >> We liked Nutch/hadoop projects in our 2 official FB pages: >> http://www.facebook.com/pages/SimilarPagescom/303352486359786?sk=wall >> >> http://www.facebook.com/pages/SimilarPages-A-web-discovery-and-search-addon/149182788451193 >> >> A take a tour video here... >> >> http://www.similarpages.com/web/index.php?option=com_content&view=article&id=15&Itemid=4 >> >> You can follow me on twitter @MrCappuccini >> >> We've finally released the beta of the SimilarPages search engine!! Check >> it out at www.similarpages.com and let us know what you think!! >> >> my best >> Pietro >> >> Pietro Borradori >> Founder & CEO >> >> [image: http://www.similarpages.com/images/Loghetto_posta.jpg] >> >> ------------------------------ >> >> >> > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > -- *Lewis*

