Hi,

The fetch command returns immediately without downloading any urls. At
least according to my experience. Can somebody else try to fetch some
urls to make sure, see if I am in the wrong or not?

I use the following process to run the command:
$ export NUTCH_ROOT=./nutch
$ svn co http://svn.apache.org/repos/asf/nutch/trunk/ $NUTCH_ROOT
$ ant
$ export NUTCH_HOME=$NUTCH_ROOT/runtime/local

Then a little bit of configuration: http.agent.name and
http.robots.agents properties in $NUTCH_HOME/conf/nutch-default.xml,
as well as Gora in $NUTCH_HOME/conf/gora.properties.

Finally:
$ $NUTCH_HOME/bin/nutch inject seeds
InjectorJob: starting
InjectorJob: urlDir: seeds
InjectorJob: finished
$ $NUTCH_HOME/bin/nutch generate
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: done
GeneratorJob: generated batch id: 1291539079-2006862361
$ $NUTCH_HOME/bin/nutch fetch 1291539079-2006862361
FetcherJob: starting
FetcherJob : timelimit set for : -1
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob: batchId: 1291539079-2006862361
FetcherJob: done
$

Nothing gets fetched.

This is the relatively immediate fix:

Index: src/java/org/apache/nutch/fetcher/FetcherJob.java
===================================================================
--- src/java/org/apache/nutch/fetcher/FetcherJob.java   (revision 1042291)
+++ src/java/org/apache/nutch/fetcher/FetcherJob.java   (working copy)
@@ -174,6 +174,7 @@
     } else {
       currentJob.setNumReduceTasks(numTasks);
     }
+    currentJob.waitForCompletion(true);
     ToolUtil.recordJobStatus(null, currentJob, results);
     return results;
   }


Alexis

Reply via email to