Run Nutch Crawl in Eclipse

Andy Xue Mon, 09 Apr 2012 18:50:43 -0700

Hi all:

I'd like to run Nutch Crawl in Eclipse. I have followed the
"RunNutchInEclipse" tutorial (http://wiki.apache.org/nutch/RunNutchInEclipse).
However when I tried to run the crawler, the following exception occurred:


==============================================================================
solrUrl is not set, indexing will be skipped...
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
 at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
==============================================================================


The Eclipse Run Configurations are all set according to the tutorial.
==============================================================================

Main Class: org.apache.nutch.crawl.Crawl
Program Arguments: urls -dir crawl -depth 2 -topN 10
VM arguments: -Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log
Working directory: Default

==============================================================================
And I have set the classpath, src path, ivy path, etc according to the
tutorial too.


I can build using Ant in Eclipse. Afterwards I can successfully manually
run the crawl script using

$NUTCH_HOME/runtime/local/bin/nutch crawl urls -dir crawl -depth 2 -topN 10


And it does do the crawl process correctly and give the correct result. But
when I try to run it in Eclipse, it always fail.



Does anyone have the similar problem and knows how it can be solved?
Appreciate your time and help.


Andy

Run Nutch Crawl in Eclipse

Reply via email to