Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "bin/nutch_updatedb" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/bin/nutch_updatedb?action=diff&rev1=4&rev2=5 Comment: Update to reflect Nutch 1.3 API - updatedb is an alias for org.apache.nutch.tools.!UpdateDatabaseTool + Updatedb is an alias for org.apache.nutch.crawl.CrawlDb - This class takes the output of the fetcher and updates the page and link DBs accordingly. Eventually, as the database scales, this will broken into several phases, each consuming and emitting batch files, but, for now, we're doing it all here. + This class takes the output of the fetcher fetcher and updates the crawldb accordingly. - Usage: bin/nutch org.apache.nutch.tools.!UpdateDatabaseTool (-local | -ndfs <namenode:port>) [-max N] [-noAdditions] <db> <seg_dir> [ <seg_dir> ... ] + Usage: + + {{{ + CrawlDb <crawldb> (-dir <segments> | <seg1> <seg2> ...) [-force] [-normalize] [-filter] [-noAdditions] + }}} + + '''<crawldb>''': This is the path to the crawldb directory we wish to update. + + '''-dir <segments>''': This should be the path to the parent directory containing all, if several, segments to update from. + + '''<seg1> <seg2> ...''': Here we would pass a comprehensive list of paths to individual segmens to update from. + + '''[-force]''': This arguement will force an update even if the crawldb appears to be locked. /!\ : CAUTION: advised /!\ + + '''[-normalize]''': This arguement uses any current URLNormalizer's on urls in crawldb and segment (usually not needed). + + '''[-filter]''': Pass this arguement to use any current URLFilters on urls in the crawldb and segment. This can provide better quality results in certain applications. + + '''[-noAdditions]''': If pass this parameter the updatedb command will only update already existing URLs, and will not add any newly discovered URLs during a fetch. + CommandLineOptions

