Sami Siren (JIRA) wrote:
[ http://issues.apache.org/jira/browse/NUTCH-361?page=comments#action_12432322 ] Sami Siren commented on NUTCH-361:
----------------------------------

I started to write (allready put some on svn trunk) some simple junit tests for 
the main tools (inject, generate, fetch). if you can extend some of those to 
demonstrate this problem then it would be easier to track down.

I run through it and here is my problem pop out [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 4.294 sec
  [junit] Test org.apache.nutch.crawl.TestGenerator FAILED

I run this on server. But I have problems run test from eclipse.
java.lang.ArithmeticException: / by zero
at org.apache.nutch.crawl.PartitionUrlByHost.getPartition(PartitionUrlByHost.java:49)
   at org.apache.hadoop.mapred.MapTask$2.collect(MapTask.java:152)
at org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:223)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:51)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:195)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:106)

Probably some configuration problems.

regards

Uros
generator create fetchlist randomly
-----------------------------------

                Key: NUTCH-361
                URL: http://issues.apache.org/jira/browse/NUTCH-361
            Project: Nutch
         Issue Type: Bug
         Components: fetcher
   Affects Versions: 0.9.0
        Environment: Java 1.5, FreeBSD 6.1
           Reporter: Uros Gruber
           Priority: Critical

I noticed problems during generating fetchlist. I already post some info at the 
users list. Today I check release 0.8 and I'm certain that problem is only in 
version later than this. I've do testnig only on 0.8 and svn from today.
The problem is that generator generate fetchlist from crawldb but everytime i 
run there is different number of urls in fetchlist.
For example I put 6 test urls we have for testing and only 5 of 20 test there 
were all urls listed in fetchlist, sometimes onyl one. Config was always the 
same also when testing at version 0.8.
I try to debug what might go wrong but I only end up that in /tmp there were 
all urls but somehow missed in crawl_generate
I also se some of 2006-09-02 20:14:20,147 DEBUG conf.Configuration - java.io.IOException: config(config)
        at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:76)
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:87)
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:98)
        at org.apache.nutch.util.NutchJob.<init>(NutchJob.java:26)
        at org.apache.nutch.crawl.Generator.generate(Generator.java:330)
        at org.apache.nutch.crawl.Generator.run(Generator.java:405)
        at org.apache.nutch.util.ToolBase.doMain(ToolBase.java:145)
        at org.apache.nutch.crawl.Generator.main(Generator.java:372)
if I enable DEBUG loging but I doubt that this has anything to do with this.


Reply via email to