Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "bin/nutch crawl" page has been changed by kiranchitturi: http://wiki.apache.org/nutch/bin/nutch%20crawl Comment: change of url from last crawl page New page: Crawl is an alias for org.apache.nutch.crawl.Crawl This class performs a complete crawl given a set of root urls. Usage: {{{ bin/nutch crawl <urlDir> [-solr <solrURL>] [-dir d] [-threads n] [-depth i] [-topN N] }}} '''<urlDir>''': Contains text files with URL lists. This must be an existing directory. Example would be ${NUTCH_HOME}/urls '''[-solr <solrURL>]''': Enables us to pass our Solr instance as an indexing parameter to simplify the process of indexing with Solr. '''[-dir d]''': This parameter enables you to choose the directory Nutch should use when crawling. '''[-threads n]''': This parameter enables you to choose how many threads Nutch should use when crawling. '''[-depth i]''': You can tell Nutch how deep it should crawl. If you don’t tell Nutch a value, it takes 5 as his standard parameter. For example if you pass –depth 1 as the parameter, Nutch will only index the first level. If you say –depth 2 (or more) Nutch will follow this number of outlinks. '''[-topN N]''': The maximum number of outlinks Nutch will obtain from any one page. CommandLineOptions

