Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by JeffRitchie:
http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_inject

The comment on the change is:
Added example fixed some text

------------------------------------------------------------------------------
  === Usage ===
   nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.Injector <crawldb> <urldir>
  
-   '''<crawldb>:''' Path to the crawldb directory.[[BR]]
+   '''<crawldb>:''' Path to the Crawl Database directory.[[BR]]
-   '''<urldir>:''' Path to the directory containing url files[[BR]]
+   '''<urldir>:''' Path to the directory containing flat text url files.[[BR]]
  
  === Configuration Files ===
   hadoop-default.xml[[BR]]
@@ -20, +20 @@

  The following properties directly affect how the Injector injects URLs.[[BR]]
   db.default.fetch.interval -- Sets the time in days between fetches.  
Default: 30.0f.[[BR]]
   db.score.injected -- Sets the default score of the URL.  Default: 1.0f.[[BR]]
-  urlnormalizer.class -- Name of the class that normalizes injected urls. 
Default: ["org.apache.nutch.net.BasicUrlNormalizer"].[[BR]]
+  urlnormalizer.class -- Name of the class that normalizes injected urls. 
Default: ["org.apache.nutch.net.BasicUrlNormalizer"].
  
  === Other Files ===
   None.
  
  === Caveats and Notes ===
-  None.
+  <urldir> may contain one or more flat text url files.  These files should 
contain one url per line to inject into the Crawl Database.[[BR]][[BR]]
+ Example: [[BR]]
+ {{{
+ nutch-0.8-dev/bin/nutch inject /path/to/crawldb /path/to/url/dir
+ 
+ Files:
+ /path/to/url/dir/nutch.txt
+ /path/to/url/dir/hadoop.txt
+ /path/to/url/dir/wikis.txt
+ 
+ nutch.txt contents:
+ http://lucene.apache.org/nutch/
+ http://lucene.apache.org/nutch/tutorial.html
+ 
+ hadoop.txt contents:
+ http://lucene.apache.org/hadoop/
+ http://lucene.apache.org/hadoop/docs/api/
+ 
+ wikis.txt contents:
+ http://wiki.apache.org/hadoop/
+ http://wiki.apache.org/nutch/
+ http://wiki.apache.org/lucene/
+ }}}
+ In this case seven urls would be injected into the Crawl Database located at 
/path/to/crawldb by the Injector.
+ 
  
  DevelopmentCommandLineOptions
  

Reply via email to