Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "NutchHadoopTutorial" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/NutchHadoopTutorial?action=diff&rev1=32&rev2=33 Ok lets have some fun! + == Hadoop Cluster Setup == - == Network Setup == - -------------------------------------------------------------------------------- It is important to know that you don't have to have big hardware to get up and running with Nutch and Hadoop. The architecture was designed in such a way to make the most of commodity hardware. For the purpose of this tutorial the nodes in the 6 node cluster are named as follows: @@ -40, +39 @@ To begin, our master node is devcluster01, by master node I mean that it will run the Hadoop services that coordinate with the slave nodes (all of the other computers) and it is the machine on which we performed our crawl. + == Downloading Hadoop and Nutch == - == Downloading Nutch and Hadoop == - -------------------------------------------------------------------------------- - Both Nutch and Hadoop are downloadable from the Apache website. The necessary Hadoop files are bundled with Nutch so unless you are going to be developing Hadoop you only need to download Nutch. + Both Nutch and Hadoop are downloadable from their respective Apache websites. - We built Nutch from source after downloading it from its subversion repository. - Nightly builds of Nutch can be found here: - http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ + You can checkout the latest and greatest Nutch from source after downloading it from its SVN repository [[http://svn.apache.org/repos/asf/nutch/trunk/|here]]. Alternatively pick up a stable release from the Nutch site. The same should be done with Hadoop, and as mentioned eariler this, along with how to set up your 6 node cluster is included in the [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]]. + We are going to use ant to build it so if you have java and ant installed you should be fine. This tutorial is not going to go into how to install java or ant, if you want a complete reference for ant pick up Erik Hatcher's book [[http://www.manning.com/hatcher|Java Development with Ant]] - At time of writing this version (Jun 2010) Nutch includes Hadoop Jars version 0.20.2 - - You can get a packaged tarball or extract from subversion. Knowing how to use tar or subversion is outside of the scope of this tutorial. Once you have a subversion client you can either browse the Nutch subversion webpage at: - - http://nutch.apache.org/version_control.html - - Or you can access the Nutch subversion repository through the client at: - - http://svn.apache.org/repos/asf/nutch/ (previously at http://svn.apache.org/repos/asf/lucene/nutch/ when Nutch was a part of Lucene) - - We are going to use ant to build it so if you have java and ant installed you should be fine. - - I am not going to go into how to install java or ant, if you are working with this level of software you should know how to do that and there are plenty of tutorial on building software with ant. If you want a complete reference for ant pick up Erik Hatcher's book "''Java Development with Ant''": - - http://www.manning.com/hatcher - - It is worth noting that previous versions of Nutch came already built. But nowadays the release is just source code and so does have to be built before use. == Building Nutch and Hadoop == --------------------------------------------------------------------------------

