Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "bin/nutch inject" page has been changed by kiranchitturi:
http://wiki.apache.org/nutch/bin/nutch%20inject

New page:
Inject is an alias for org.apache.nutch.crawl.Injector

This class takes a flat file of URLs and adds them to the of pages to be 
crawled. It is useful for bootstrapping the system. The URL files contain one 
URL per line, optionally followed by custom metadata separated by tabs with the 
metadata key separated from the corresponding value by '='.

Note that some metadata keys are reserved: 

''nutch.score'': allows to set a custom score for a specific URL

''nutch.fetchInterval'': allows to set a custom fetch interval for a specific 
URL 

''userType'': this can be any metadata field which you then assign a value. In 
the example here we use userType to refer to the nature of Nutch as an open 
source project.

e.g. http://www.xyz.org/ nutch.score=10 nutch.fetchInterval=2592000 
userType=open_source

Usage: 
{{{
bin/nutch inject <crawldb> <url_dir>
}}}

'''<crawldb>''': The directory containing the crawldb

'''<url_dir>''': The directory containing our seed list (referred to above as 
'flat file'), usually a text document containing URLs, one URL per line.


CommandLineOptions

Reply via email to