Sebastian Nagel created NUTCH-3044:
--------------------------------------

             Summary: Generator: NPE when extracting the host part of a URL 
fails
                 Key: NUTCH-3044
                 URL: https://issues.apache.org/jira/browse/NUTCH-3044
             Project: Nutch
          Issue Type: Bug
          Components: generator
    Affects Versions: 1.20
            Reporter: Sebastian Nagel
             Fix For: 1.21


When extracting the host part of a URL fails, the Generator job fails because 
of a NPE in the SelectorReducer. This issue is reproducible if the CrawlDb 
contains an malformed URL, for example, a URL with an unsupported scheme 
(smb://).

{noformat}
Caused by: java.lang.NullPointerException
  at org.apache.nutch.crawl.Generator$SelectorReducer.reduce(Generator.java:439)
  at org.apache.nutch.crawl.Generator$SelectorReducer.reduce(Generator.java:300)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to