Dear wiki user,

You have subscribed to a wiki page "Nutch Wiki" for change notification.

The page Crawl has been reverted to revision 10 by cmd.
http://wiki.apache.org/nutch/Crawl?action=diff&rev1=11&rev2=12

--------------------------------------------------

  The complete job of this script has been divided broadly into 8 steps.
  
   1. Inject URLs
-  1. Generate, Fetch, Parse, Update Loop
+  2. Generate, Fetch, Parse, Update Loop
-  1. Merge Segments
+  3. Merge Segments
-  1. Invert Links
+  4. Invert Links
-  1. Index
+  5. Index
-  1. Dedup
+  6. Dedup
-  1. Merge Indexes
+  7. Merge Indexes
-  1. Load new indexes
+  8. Load new indexes
  
  == Modes of Execution ==
  The script can be executed in two modes:-
- 
   * Normal Mode
   * Safe Mode
  
@@ -43, +42 @@

  then
    NUTCH_HOME=.
  }}}
+ 
  Set 'NUTCH_HOME' to the path of the Nutch directory (if you are not setting 
it as an environment variable, since if environment variable is set, the above 
assignment is ignored).
  
  === CATALINA_HOME ===
@@ -53, +53 @@

  then
    CATALINA_HOME=/opt/apache-tomcat-6.0.10
  }}}
+ 
  Similar to the previous section, if this variable is set in the environment, 
then the above assignment is ignored.
  
  == Can it re-crawl? ==
  The author has used this script to re-crawl a couple of times. However, no 
real world testing has been done for re-crawling. Therefore, you may try to use 
the script for re-crawl. If it works fine or it doesn't work properly for 
re-crawl, please let us know.
  
  == Script ==
+ {{{
- {{{#!/bin/sh
+ #!/bin/sh
  
  # runbot script to run the Nutch bot for crawling and re-crawling.
  # Usage: bin/runbot [safe]
@@ -88, +90 @@

  then
    NUTCH_HOME=.
    echo runbot: $0 could not find environment variable NUTCH_HOME
-   echo runbot: NUTCH_HOME=$NUTCH_HOME has been set by the script
+   echo runbot: NUTCH_HOME=$NUTCH_HOME has been set by the script 
  else
-   echo runbot: $0 found environment variable NUTCH_HOME=$NUTCH_HOME
+   echo runbot: $0 found environment variable NUTCH_HOME=$NUTCH_HOME 
  fi
  
  if [ -z "$CATALINA_HOME" ]
  then
    CATALINA_HOME=/opt/apache-tomcat-6.0.10
    echo runbot: $0 could not find environment variable NUTCH_HOME
-   echo runbot: CATALINA_HOME=$CATALINA_HOME has been set by the script
+   echo runbot: CATALINA_HOME=$CATALINA_HOME has been set by the script 
  else
-   echo runbot: $0 found environment variable CATALINA_HOME=$CATALINA_HOME
+   echo runbot: $0 found environment variable CATALINA_HOME=$CATALINA_HOME 
  fi
  
  if [ -n "$topN" ]

Reply via email to