Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "Crawl" page has been changed by cmd. The comment on this change is: Does anybody can help me translate it to windows bat script. thanks. . http://wiki.apache.org/nutch/Crawl?action=diff&rev1=10&rev2=11 -------------------------------------------------- The complete job of this script has been divided broadly into 8 steps. 1. Inject URLs - 2. Generate, Fetch, Parse, Update Loop + 1. Generate, Fetch, Parse, Update Loop - 3. Merge Segments + 1. Merge Segments - 4. Invert Links + 1. Invert Links - 5. Index + 1. Index - 6. Dedup + 1. Dedup - 7. Merge Indexes + 1. Merge Indexes - 8. Load new indexes + 1. Load new indexes == Modes of Execution == The script can be executed in two modes:- + * Normal Mode * Safe Mode @@ -42, +43 @@ then NUTCH_HOME=. }}} - Set 'NUTCH_HOME' to the path of the Nutch directory (if you are not setting it as an environment variable, since if environment variable is set, the above assignment is ignored). === CATALINA_HOME === @@ -53, +53 @@ then CATALINA_HOME=/opt/apache-tomcat-6.0.10 }}} - Similar to the previous section, if this variable is set in the environment, then the above assignment is ignored. == Can it re-crawl? == The author has used this script to re-crawl a couple of times. However, no real world testing has been done for re-crawling. Therefore, you may try to use the script for re-crawl. If it works fine or it doesn't work properly for re-crawl, please let us know. == Script == - {{{ - #!/bin/sh + {{{#!/bin/sh # runbot script to run the Nutch bot for crawling and re-crawling. # Usage: bin/runbot [safe] @@ -90, +88 @@ then NUTCH_HOME=. echo runbot: $0 could not find environment variable NUTCH_HOME - echo runbot: NUTCH_HOME=$NUTCH_HOME has been set by the script + echo runbot: NUTCH_HOME=$NUTCH_HOME has been set by the script else - echo runbot: $0 found environment variable NUTCH_HOME=$NUTCH_HOME + echo runbot: $0 found environment variable NUTCH_HOME=$NUTCH_HOME fi if [ -z "$CATALINA_HOME" ] then CATALINA_HOME=/opt/apache-tomcat-6.0.10 echo runbot: $0 could not find environment variable NUTCH_HOME - echo runbot: CATALINA_HOME=$CATALINA_HOME has been set by the script + echo runbot: CATALINA_HOME=$CATALINA_HOME has been set by the script else - echo runbot: $0 found environment variable CATALINA_HOME=$CATALINA_HOME + echo runbot: $0 found environment variable CATALINA_HOME=$CATALINA_HOME fi if [ -n "$topN" ]

