Hiran Chaudhuri created NUTCH-3075: -------------------------------------- Summary: tld plugin makes injector crash Key: NUTCH-3075 URL: https://issues.apache.org/jira/browse/NUTCH-3075 Project: Nutch Issue Type: Bug Components: injector Affects Versions: 1.21 Environment: * Ubuntu 22 LTS * openjdk version "21.0.4" 2024-07-16 LTS Reporter: Hiran Chaudhuri
I cloned the current master branch (commit id d6f55b8ea6f5809cef5a31239e5760be23742c00) which nicely compiles to apache-nutch-1.21-SNAPSHOT.job Even after I added my own protocol-imap implementation. Crawling works to some degree - I am heavily experimenting with IMAP and the data I receive in Solr. Looking at the [IndexStructure|https://cwiki.apache.org/confluence/display/NUTCH/IndexStructure] I hoped to get better information by adding all the mentioned plugins. Thus I reconfigured nutch-site.xml, especially the `plugin.includes` property to include them all. As soon as `tld` is contained, upon seeding my CrawlDb the injector dies with {{2024-10-11 23:27:51,519 INFO org.apache.nutch.crawl.Injector [main] Injecting seed URL file file:/home/hiran/NetBeansProjects/nutch/urls/seed.txt}} {{2024-10-11 23:27:52,778 ERROR org.apache.nutch.crawl.Injector [main] Injector job did not succeed, job id: job_local1500911141_0001, job status: FAILED, reason: NA}} {{2024-10-11 23:27:52,779 ERROR org.apache.nutch.crawl.Injector [main] Injector: java.lang.RuntimeException: Injector job did not succeed, job id: job_local1500911141_0001, job status: FAILED, reason: NA}} {{ at org.apache.nutch.crawl.Injector.inject(Injector.java:446)}} {{ at org.apache.nutch.crawl.Injector.run(Injector.java:574)}} {{ at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)}} {{ at org.apache.nutch.crawl.Injector.main(Injector.java:538)}} The behaviour can be cured by simply removing `tld` from the property. * Could there be some better error message? * Why does the tld plugin crash the injector phase at all? -- This message was sent by Atlassian Jira (v8.20.10#820010)