Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by susam: http://wiki.apache.org/nutch/Crawl The comment on the change is: fixed typos ------------------------------------------------------------------------------ == Introduction == - This is a script to crawl an Internet or the web. It does not crawl using the 'bin/crawl' tool or 'Crawl' class present in Nutch, therefore the filters present in 'conf/crawl-urlfilter.txt ' has not effect on this script. The filters for this script must be set in 'regex-urlfilter.txt'. + This is a script to crawl an Intranet as well as the web. It does not crawl using the 'bin/crawl' tool or 'Crawl' class present in Nutch. Therefore the filters present in 'conf/crawl-urlfilter.txt ' has no effect on this script. The filters for this script must be set in 'regex-urlfilter.txt'. == Steps == The complete job of this script has been divided broadly into 8 steps.