Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by JuhoMäkinen:
http://wiki.apache.org/nutch/bin/nutch_generate

New page:
Describe bin/nutch generate here.

The generate command is used to create a new fetchlist from the webdb which 
contains urls which can be fetched using the fetch tool.

Usage: FetchListTool (-local | -ndfs <namenode:port>) <db>  <segment_dir> 
[-refetchonly] [-topN N] [-cutoff cutoffscore] [-numFetchers numFetchers] 
[-adddays numDays]

Command line parameters:

'''-topN N''' where N is a number of pages.

Normally, the "generate" command prepares a fetchlist out of
all unfetched pages, or the ones where fetch interval already expired.
But if you use -topN, then instead of all unfetched urls you only get N
urls with the highest score - potentially the most interesting ones,
which should be prioritized in fetching.


 - Juho Mäkinen

Reply via email to