Sebastian Nagel created NUTCH-3044:
--------------------------------------
Summary: Generator: NPE when extracting the host part of a URL
fails
Key: NUTCH-3044
URL: https://issues.apache.org/jira/browse/NUTCH-3044
Project: Nutch
Issue Type: Bug
Components: generator
Affects Versions: 1.20
Reporter: Sebastian Nagel
Fix For: 1.21
When extracting the host part of a URL fails, the Generator job fails because
of a NPE in the SelectorReducer. This issue is reproducible if the CrawlDb
contains an malformed URL, for example, a URL with an unsupported scheme
(smb://).
{noformat}
Caused by: java.lang.NullPointerException
at org.apache.nutch.crawl.Generator$SelectorReducer.reduce(Generator.java:439)
at org.apache.nutch.crawl.Generator$SelectorReducer.reduce(Generator.java:300)
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)