Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JeffRitchie: http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_inject The comment on the change is: Added example fixed some text ------------------------------------------------------------------------------ === Usage === nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.Injector <crawldb> <urldir> - '''<crawldb>:''' Path to the crawldb directory.[[BR]] + '''<crawldb>:''' Path to the Crawl Database directory.[[BR]] - '''<urldir>:''' Path to the directory containing url files[[BR]] + '''<urldir>:''' Path to the directory containing flat text url files.[[BR]] === Configuration Files === hadoop-default.xml[[BR]] @@ -20, +20 @@ The following properties directly affect how the Injector injects URLs.[[BR]] db.default.fetch.interval -- Sets the time in days between fetches. Default: 30.0f.[[BR]] db.score.injected -- Sets the default score of the URL. Default: 1.0f.[[BR]] - urlnormalizer.class -- Name of the class that normalizes injected urls. Default: ["org.apache.nutch.net.BasicUrlNormalizer"].[[BR]] + urlnormalizer.class -- Name of the class that normalizes injected urls. Default: ["org.apache.nutch.net.BasicUrlNormalizer"]. === Other Files === None. === Caveats and Notes === - None. + <urldir> may contain one or more flat text url files. These files should contain one url per line to inject into the Crawl Database.[[BR]][[BR]] + Example: [[BR]] + {{{ + nutch-0.8-dev/bin/nutch inject /path/to/crawldb /path/to/url/dir + + Files: + /path/to/url/dir/nutch.txt + /path/to/url/dir/hadoop.txt + /path/to/url/dir/wikis.txt + + nutch.txt contents: + http://lucene.apache.org/nutch/ + http://lucene.apache.org/nutch/tutorial.html + + hadoop.txt contents: + http://lucene.apache.org/hadoop/ + http://lucene.apache.org/hadoop/docs/api/ + + wikis.txt contents: + http://wiki.apache.org/hadoop/ + http://wiki.apache.org/nutch/ + http://wiki.apache.org/lucene/ + }}} + In this case seven urls would be injected into the Crawl Database located at /path/to/crawldb by the Injector. + DevelopmentCommandLineOptions
