Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JuhoMäkinen: http://wiki.apache.org/nutch/bin/nutch_generate New page: Describe bin/nutch generate here. The generate command is used to create a new fetchlist from the webdb which contains urls which can be fetched using the fetch tool. Usage: FetchListTool (-local | -ndfs <namenode:port>) <db> <segment_dir> [-refetchonly] [-topN N] [-cutoff cutoffscore] [-numFetchers numFetchers] [-adddays numDays] Command line parameters: '''-topN N''' where N is a number of pages. Normally, the "generate" command prepares a fetchlist out of all unfetched pages, or the ones where fetch interval already expired. But if you use -topN, then instead of all unfetched urls you only get N urls with the highest score - potentially the most interesting ones, which should be prioritized in fetching. - Juho Mäkinen
