IOException during #Crawl.run -> #JobClient.runJob()

cephtahrioh Sun, 31 Mar 2013 09:52:27 -0700

Hello guys, I am pretty new with nutch so bear with me. I have been
encountering an IOException during one of my test crawls. I am using nutch
1.6 with hadoop 0.20.2 (chose this version for windows compatibiliy in
setting file access rights).


I am running nutch through eclipse. I followed this guide in importing nutch
from an SVN: http://wiki.apache.org/nutch/RunNutchInEclipse

My crawler's code is from this website:
http://cmusphinx.sourceforge.net/2012/06/building-a-java-application-with-apache-nutch-and-solr/

Here is the system exception log:

solrUrl is not set, indexing will be skipped...
crawl started in: crawl
rootUrlDir = urls
threads = 1
depth = 1
solrUrl=null
topN = 1
Injector: starting at 2013-03-31 23:51:11
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
*java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:218)
        at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at rjpb.sp.crawler.CrawlerTest.main(CrawlerTest.java:51)*

I see these calls involving paths before #Injector.inject() in Crawl.java

*Path crawlDb = new Path(dir + "/crawldb");
Path linkDb = new Path(dir + "/linkdb");
Path segments = new Path(dir + "/segments");
Path indexes = new Path(dir + "/indexes");
Path index = new Path(dir + "/index");*

Currently I my eclipse project does not include the folders
crawldb,linkdb,segments... I think my problem is I have not set all the
necessary files for crawling. I have only set
nutch-site.xml,regex-urlfilter.txt, and urls/seed.txt. Any advice on the
matter will be of great help. Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/IOException-during-Crawl-run-JobClient-runJob-tp4052732.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.

IOException during #Crawl.run -> #JobClient.runJob()

Reply via email to