Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "RunNutchInEclipse" page has been changed by SebastianNagel:
http://wiki.apache.org/nutch/RunNutchInEclipse?action=diff&rev1=35&rev2=36

Comment:
typos

  ##Original credits: RenaudRichardet
- 
  = RunNutchInEclipse =
  This page acts as a resource for working with Nutch from within the Eclipse 
IDE. It is intended to provide a comprehensive beginning resource for the 
configuration, building, crawling and debugging of Nutch trunk in the above 
context.
  
  == Tested with ==
   * Nutch trunk (version 1.5 @date 09112011)
   * Eclipse Indigo Service Release 1
-      Build id: 20110916-0149
+   . Build id: 20110916-0149
   * Java JDK 1.6.0_25
   * Ubuntu Release 11.04 (natty)
-      Kernel Linux 2.6.38-10-generic
+   . Kernel Linux 2.6.38-10-generic GNOME 2.32.1
-      GNOME 2.32.1
   * Windows Vista (Service Edition 2)
  
  == Before you start ==
@@ -20, +18 @@

  
  This tutorial covers a fully internal Eclipse/Nutch set up, using only 
Eclipse tools and associated plugins.
  
- == Prerequsites ==
+ == Prerequisites ==
-  * Grab the newest version of Eclipse availble 
[[http://www.eclipse.org/downloads/|here]].
+  * Grab the newest version of Eclipse available 
[[http://www.eclipse.org/downloads/|here]].
   * All of the following should be available from the 
[[http://marketplace.eclipse.org/marketplace-client-intro|Eclipse 
Marketplace]]. However if not, you can download them throughout Eclipse as 
follows.
-  * Once you've set up Eclipse, download Subclipse as per 
[[http://subclipse.tigris.org/|here]]. N.B. If you experience an error with the 
1.8.x release, try 1.6.x. This tends to solve compatibility problems.  
+  * Once you've set up Eclipse, download Subclipse as per 
[[http://subclipse.tigris.org/|here]]. N.B. If you experience an error with the 
1.8.x release, try 1.6.x. This tends to solve compatibility problems.
   * Grab IvyDE plugin for Eclipse as 
[[http://ant.apache.org/ivy/ivyde/download.cgi|here]].
   * Grab m2e plugin for Eclipse 
[[http://marketplace.eclipse.org/content/maven-integration-eclipse|here]]
  
@@ -31, +29 @@

  
  == Steps ==
  === Install Nutch ===
- Use the Subclipse plugin to check out the latest Nutch Trunk development. 
+ Use the Subclipse plugin to check out the latest Nutch Trunk development.
+ 
   * File > New > Project > SVN > Checkout Projects from SVN
   * Create new repository location > 
https://svn.apache.org/repos/asf/nutch/trunk
   * Subclipse will ask some additional configuration options, at this stage 
checkout the trunk source as a project configured using the '''New Project 
Wizard'''. Ensure that you're checking out the HEAD revision, then progress to 
Finish.
@@ -41, +40 @@

   * Do not build Nutch now. Make sure you have no .project and .classpath 
files in the Nutch directory and that Nutch has not built the /runtime 
directory '''N.B.''' This is absolutely essential.
  
  === Establish the Eclipse environment for Nutch ===
- 
-  * Ensure that you're in the Package Explorer > right click on Trunk Project 
folder. 
+  * Ensure that you're in the Package Explorer > right click on Trunk Project 
folder.
   * The only Source folder will be trunk/src > '''Remove''' this folder > Add 
Folder > expand trunk/src and check src/bin, src/java, src/test & 
src/testresources.
-  * In additon, we must maunally add '''EVERY''' individual plugin src/java 
and src/test folder, although this takes some time it is absolutely essential 
that this is done.
+  * In addition, we must manually add '''EVERY''' individual plugin src/java 
and src/test folder, although this takes some time it is absolutely essential 
that this is done.
   * In the Libraries tab, click Add Class Folder and add src/conf to the 
classpath.
   * Still in the Libraries tab add JARs > 
src/plugin/urlfilter-automaton/lib/automaton.jar & 
src/plugin/parse-swf/lib/javaswf.jar
   * Remaining in the Libraries tab Add Library > IvyDE Managed Dependencies > 
browse to trunk/ivy/ivy.xml > ensure '''ALL''' configuration boxes are included.
   * Go to "Order and Export" tab, find the entry for added "conf" folder (it 
will most likely be at the bottom of the list) and move it to the top (by 
checking it and clicking the "Top" button). This is required so Eclipse will 
take config (nutch-default.xml, etc.) resources from our "conf" folder and not 
from somewhere else.
   * DO NOT add "build" to classpath
   * Click the "Finish" button
- 
  
  === Configure Nutch ===
   * see the [[http://wiki.apache.org/nutch/NutchTutorial|Tutorial]] and follow 
all configuration steps, ensure that you '''DO NOT''' undertake any crawling. 
The directory structure for Nutch trunk enables us to edit 
nutch-site.xml.template, nutch-default.xml and regex-urlfilter.txt.template in 
our /conf directory, these properties will then be automatically built into our 
/runtime build folder.
@@ -60, +57 @@

  
  === Build Nutch ===
   * We can now progress to building Nutch by simply dragging the build.xml 
file into the Ant view and double clicking on the build file. If you configured 
the project correctly, Eclipse will build Nutch for you into "bin" and you 
should see something similar to the following:
+ 
  {{{
  BUILD SUCCESSFUL
  Total time: 33 seconds
@@ -106, +104 @@

  The Nutch source code must be out of the workspace folder. Alternatively you 
can download the code with eclipse (svn) under your workspace rather than try 
to create the project using existing code, eclipse sometimes doesn't let you do 
it from source code into the workspace.
  
  === plugin directory not found ===
- Make sure you set your plugin.folders property correct, instead of using a 
relative path you can use a absolute one as well in nutch-default.xml or even 
better in nutch-site.xml. Ideally all efforts should be made to keep 
nutch-defult.xml completely intact.
+ Make sure you set your plugin.folders property correct, instead of using a 
relative path you can use a absolute one as well in nutch-default.xml or even 
better in nutch-site.xml. Ideally all efforts should be made to keep 
nutch-default.xml completely intact.
  
  {{{
  <property>
    <name>plugin.folders</name>
    <value>/home/....../trunk/src/plugin</value>
  }}}
- 
  === No plugins loaded during unit tests in Eclipse ===
  During unit testing, Eclipse ignored conf/nutch-site.xml in favor of 
src/test/nutch-site.xml, so you might need to add the plugin directory 
configuration to that file as well.
  
@@ -123, +120 @@

  
  === debugging Hadoop classes ===
  Sometimes (fairly often) it makes sense to also have the Hadoop classes 
available during debugging. This should really second nature as Nutch heavily 
relies upon the underlying Hadoop infrastructure. Therefore you can check out 
(svn) the Hadoop sources into your Eclipse IDE and combine to debug this way. 
You can:
-   * Checkout the Hadoop version that should be used within Nutch trunk
-   * configure a Hadoop project similar to the Nutch project within your 
Eclipse IDE
-   * add the Hadoop project as a dependent project of Nutch project
-   * you can now also set break points within Hadoop classes like inputformat 
implementations etc.
  
+  * Checkout the Hadoop version that should be used within Nutch trunk
+  * configure a Hadoop project similar to the Nutch project within your 
Eclipse IDE
+  * add the Hadoop project as a dependent project of Nutch project
+  * you can now also set break points within Hadoop classes like inputformat 
implementations etc.
+ 

Reply via email to