Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by Gal Nitzan: http://wiki.apache.org/nutch/FAQ ------------------------------------------------------------------------------ ==== Is there a mail archive? ==== Yes: http://www.mail-archive.com/nutch-user%40lucene.apache.org/maillist.html or http://www.nabble.com/Nutch-f362.html . + + ==== What Java version is required to run Nutch? ==== + + Nutch 0.7 will run with Java 1.4 and up. ==== My system does not find the segments folder. Why? OR How do I tell the ''Nutch Servlet'' where the index file are located? ==== @@ -82, +86 @@ * Set NUTCH_CONF_DIR environment variable to point into the directory you created * run $NUTCH_HOME/bin/nutch so that it gets the NUTCH_CONF_DIR environment variable. You should check the command outputs for lines where the configs are loaded, that they are really loaded from your custom dir. * Happy using. + + ==== bin/nutch generate generates empty fetchlist, what can I do? ==== + + The reason for that is that when a page is fetched, it is timestamped in the webdb. So basiclly if its time is not up it will not be included in a fetchlist. So for example if you generated a fetchlist and you deleted the segment dir created. calling generate again will generate an empty fetchlist. + So, two choices: + 1) Change your system date to be 30 days from today (if you haven't changed the default settings) and re-run bin/nutch generate... + + 2) Call bin/nutch generate with the -adddays 30 (if you haven't changed the default settings) to make generate think the time has come... + + After generate you can call bin/nutch fetch. ==== While fetching I get UnknownHostException for known hosts ====
