[ https://issues.apache.org/jira/browse/NUTCH-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel updated NUTCH-2468: ----------------------------------- Priority: Minor (was: Major) > should filter out invalid URLs by default > ----------------------------------------- > > Key: NUTCH-2468 > URL: https://issues.apache.org/jira/browse/NUTCH-2468 > Project: Nutch > Issue Type: Improvement > Components: bin > Affects Versions: 1.12 > Reporter: Michael Coffey > Priority: Minor > Fix For: 1.14 > > > Some Nutch components, by default, should reject invalid URLs. This was > recently discussed in the users mailing list and has affected my work for a > while. Although there may be some special-purpose needs to collect invalid > URLs, they are not generally useful for crawling. -- This message was sent by Atlassian JIRA (v6.4.14#64029)