Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "bin/nutch updatedb" page has been changed by kiranchitturi: http://wiki.apache.org/nutch/bin/nutch%20updatedb New page: Updatedb is an alias for org.apache.nutch.crawl.CrawlDb This class takes the output of the fetcher fetcher and updates the crawldb accordingly. Usage: {{{ bin/nutch updatedb <crawldb> (-dir <segments> | <seg1> <seg2> ...) [-force] [-normalize] [-filter] [-noAdditions] }}} '''<crawldb>''': This is the path to the crawldb directory we wish to update. '''-dir <segments>''': This should be the path to the parent directory containing all, if several, segments to update from. '''<seg1> <seg2> ...''': Here we would pass a comprehensive list of paths to individual segmens to update from. '''[-force]''': This arguement will force an update even if the crawldb appears to be locked. /!\ : CAUTION: advised /!\ '''[-normalize]''': This arguement uses any current URLNormalizer's on urls in crawldb and segment (usually not needed). '''[-filter]''': Pass this arguement to use any current URLFilters on urls in the crawldb and segment. This can provide better quality results in certain applications. '''[-noAdditions]''': If pass this parameter the updatedb command will only update already existing URLs, and will not add any newly discovered URLs during a fetch. CommandLineOptions

