generator create fetchlist randomly
-----------------------------------
Key: NUTCH-361
URL: http://issues.apache.org/jira/browse/NUTCH-361
Project: Nutch
Issue Type: Bug
Components: fetcher
Affects Versions: 0.9.0
Environment: Java 1.5, FreeBSD 6.1
Reporter: Uros Gruber
Priority: Critical
I noticed problems during generating fetchlist. I already post some info at the
users list. Today I check release 0.8 and I'm certain that problem is only in
version later than this. I've do testnig only on 0.8 and svn from today.
The problem is that generator generate fetchlist from crawldb but everytime i
run there is different number of urls in fetchlist.
For example I put 6 test urls we have for testing and only 5 of 20 test there
were all urls listed in fetchlist, sometimes onyl one. Config was always the
same also when testing at version 0.8.
I try to debug what might go wrong but I only end up that in /tmp there were
all urls but somehow missed in crawl_generate
I also se some of
2006-09-02 20:14:20,147 DEBUG conf.Configuration - java.io.IOException:
config(config)
at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:76)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:87)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:98)
at org.apache.nutch.util.NutchJob.<init>(NutchJob.java:26)
at org.apache.nutch.crawl.Generator.generate(Generator.java:330)
at org.apache.nutch.crawl.Generator.run(Generator.java:405)
at org.apache.nutch.util.ToolBase.doMain(ToolBase.java:145)
at org.apache.nutch.crawl.Generator.main(Generator.java:372)
if I enable DEBUG loging but I doubt that this has anything to do with this.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira