Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "bin/nutch generate" page has been changed by SebastianNagel: https://wiki.apache.org/nutch/bin/nutch%20generate?action=diff&rev1=4&rev2=5 Comment: Add information about scope (per segment / over all segments) of -topN and generate.max.count when multiple segments are generated '''<segments_dir>''': Path to the location of our segments directory where the Fetcher Segments are created. - '''[-force]''': This arguement will force an update even if there appears to be a lock. /!\ : CAUTION: advised /!\ + '''[-force]''': This argument will force an update even if there appears to be a lock. /!\ : CAUTION: advised /!\ '''[-topN N]''': Where N is the number of top URLs to be selected. Normally, the "generate" command prepares a fetchlist out of all unfetched pages, or the ones where fetch interval already expired. But if you use -topN, then instead of all unfetched urls you only get N urls with the highest score - potentially the most interesting ones, which should be prioritized in fetching. @@ -27, +27 @@ '''[-noNorm]''': The exact same applies for normalisation parameter as does for the filtering option above. - '''[-maxNumSegments num]''': The (maximum) number of segments to be generated. Default: 1 + '''[-maxNumSegments num]''': The (maximum) number of segments to be generated. Default: 1 -- Note: if multiple segments are generated, the limit -topN applies to the total number of URLs for all segments taken together, while generate.max.count is applied to every generated segment individually. ==== Configuration Files ==== hadoop-default.xml<<BR>>

