Hi guys,
I am currently running Nutch .8.2-dev on MS Windows Vista using Sun JVM 6. I have setup Nutch in my IDE (NetBeans) and it works great. Afterward, I have applied Nutch-61 https://issues.apache.org/jira/browse/NUTCH-61 to my local version. Now, when I run Nutch within the IDE, all the steps are performed with no problem. I can view the content of the crawldb, segments and index are fine. If i run it a loop, the process execute without any problem. I then package the version and run it in a testing environment. At first no index were being created. I setup the log files for Hadoop to debug as Nutch wasn't giving any errors. There are some debug line from Hadoop that look suspicious. Below is an extract: >From the log status, I can see that the problem occurs on Generate and Inject stage. Can anybody help me in overcoming this problem, I will be glad to provide a working version of the Nutch-61 once tested. 2007-04-05 16:35:30,976 INFO mapred.LocalJobRunner - E:/iDna-nutch-RC1/iDna-nutch-launcher/test/urls/urls:0+55 2007-04-05 16:35:31,073 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 2007-04-05 16:35:31,074 INFO crawl.FetchSchedule - defaultInterval=7.46496E9 2007-04-05 16:35:31,074 INFO crawl.FetchSchedule - maxInterval=2592000.0 2007-04-05 16:35:31,084 DEBUG io.SequenceFile - running sort pass 2007-04-05 16:35:31,096 INFO io.SequenceFile - flushing segment 0 2007-04-05 16:35:31,928 INFO mapred.JobClient - map 100% reduce 0% 2007-04-05 16:35:31,940 INFO mapred.LocalJobRunner - reduce > reduce 2007-04-05 16:35:32,928 INFO mapred.JobClient - Job complete: job_ui1cje 2007-04-05 16:35:32,928 INFO crawl.Injector - Injector: Merging injected urls into crawl db. 2007-04-05 16:35:32,938 DEBUG conf.Configuration - java.io.IOException: config(config) at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:76) at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:86) at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:97) at org.apache.nutch.util.NutchJob.<init>(NutchJob.java:26) at org.apache.nutch.crawl.CrawlDb.createJob(CrawlDb.java:74) at org.apache.nutch.crawl.Injector.inject(Injector.java:222) at org.apache.nutch.crawl.Injector.main(Injector.java:242) at com.idna.nutch.launcher.CrawlerManager.injector(CrawlerManager.java:63) at com.idna.nutch.launcher.CrawlerManager.main(CrawlerManager.java:209) 2007-04-05 16:35:32,943 INFO conf.Configuration - parsing jar:file:/E:/iDna-nutch-RC1/nutch-0.8.2-dev/lib/hadoop-0.4.0-patched.jar!/ha doop-default.xml 2007-04-05 16:35:32,951 INFO conf.Configuration - parsing file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/nutch-default.xml 2007-04-05 16:35:32,961 INFO conf.Configuration - parsing jar:file:/E:/iDna-nutch-RC1/nutch-0.8.2-dev/lib/hadoop-0.4.0-patched.jar!/ma pred-default.xml 2007-04-05 16:35:32,966 INFO conf.Configuration - parsing jar:file:/E:/iDna-nutch-RC1/nutch-0.8.2-dev/lib/hadoop-0.4.0-patched.jar!/ma pred-default.xml 2007-04-05 16:35:32,973 INFO conf.Configuration - parsing file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/nutch-site.xml 2007-04-05 16:35:32,980 INFO conf.Configuration - parsing file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/hadoop-site.xml 2007-04-05 16:35:33,040 DEBUG conf.Configuration - java.io.IOException: config(config) at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:76) at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:86) at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:58) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:182) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:292) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327) at org.apache.nutch.crawl.Injector.inject(Injector.java:224) at org.apache.nutch.crawl.Injector.main(Injector.java:242) at com.idna.nutch.launcher.CrawlerManager.injector(CrawlerManager.java:63) at com.idna.nutch.launcher.CrawlerManager.main(CrawlerManager.java:209) 2007-04-05 16:35:33,501 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 2007-04-05 16:35:33,501 INFO crawl.FetchSchedule - defaultInterval=7.46496E9 2007-04-05 16:35:33,501 INFO crawl.FetchSchedule - maxInterval=2592000.0 2007-04-05 16:35:33,508 DEBUG io.SequenceFile - running sort pass 2007-04-05 16:35:33,514 INFO io.SequenceFile - flushing segment 0 2007-04-05 16:35:33,639 INFO mapred.LocalJobRunner - reduce > reduce 2007-04-05 16:35:34,120 INFO mapred.JobClient - Job complete: job_qzwgkh 2007-04-05 16:35:34,429 INFO crawl.Injector - Injector: done 2007-04-05 16:35:34,439 INFO crawl.Generator - topN: 100 2007-04-05 16:35:34,439 DEBUG conf.Configuration - java.io.IOException: config() at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:67) at org.apache.nutch.util.NutchConfiguration.create(NutchConfiguration.java:50) at org.apache.nutch.crawl.Generator.main(Generator.java:416) at com.idna.nutch.launcher.CrawlerManager.autoGenSegList(CrawlerManager.java:80 ) at com.idna.nutch.launcher.CrawlerManager.main(CrawlerManager.java:211) 2007-04-05 16:35:34,443 INFO conf.Configuration - parsing jar:file:/E:/iDna-nutch-RC1/nutch-0.8.2-dev/lib/hadoop-0.4.0-patched.jar!/ha doop-default.xml 2007-04-05 16:35:34,450 INFO conf.Configuration - parsing file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/nutch-default.xml 2007-04-05 16:35:34,462 INFO conf.Configuration - parsing file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/nutch-site.xml 2007-04-05 16:35:34,468 INFO conf.Configuration - parsing file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/hadoop-site.xml 2007-04-05 16:35:35,470 INFO crawl.Generator - Generator: starting 2007-04-05 16:35:35,470 INFO crawl.Generator - Generator: segment: test/segments/20070405163535 2007-04-05 16:35:35,470 INFO crawl.Generator - Generator: Selecting best-scoring urls due for fetch. 2007-04-05 16:35:35,471 DEBUG conf.Configuration - java.io.IOException: config(config) at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:76) at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:86) at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:97) at org.apache.nutch.util.NutchJob.<init>(NutchJob.java:26) at org.apache.nutch.crawl.Generator.generate(Generator.java:309) at org.apache.nutch.crawl.Generator.main(Generator.java:417) at com.idna.nutch.launcher.CrawlerManager.autoGenSegList(CrawlerManager.java:80 ) at com.idna.nutch.launcher.CrawlerManager.main(CrawlerManager.java:211) =========================== Armel T. Nene iDNA Solutions LTD Tel: +44 (20) 7257 6124 Mobile: +44 (7886)950 483 Web: http://www.idna-solutions.com Blog: http://blog.idna-solutions.com
