Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by MatthiasGuenter: http://wiki.apache.org/nutch/FAQ ------------------------------------------------------------------------------ ==== While fetching I get UnknownHostException for known hosts ==== Make sure your DNS server is working and/or it can handle the load of requests. + + + ==== It seems as if not all links are followed in the pages in my URL lists ==== + + 1.) Make sure that your expressions in conf/crawl-urlfilter.txt are correct, perhaps the links are dropped there. + 2.) Make sure that in conf/nutch-site.xml the following parameters are set appropriate: + * http.content.limit: otherwise some content my never be fetched at all + * db.max.outlinks.per.page: otherwise the links might be dropped. + 3.) Make sure you have the parse-js and all other necessary plugins active in conf/nutch-site.xml === Updating ===
