Michael Coffey created NUTCH-2468:
-------------------------------------
Summary: should filter out invalid URLs by default
Key: NUTCH-2468
URL: https://issues.apache.org/jira/browse/NUTCH-2468
Project: Nutch
Issue Type: Bug
Components: bin
Affects Versions: 1.12
Reporter: Michael Coffey
Some Nutch components, by default, should reject invalid URLs. This was
recently discussed in the users mailing list and has affected my work for a
while. Although there may be some special-purpose needs to collect invalid
URLs, they are not generally useful for crawling.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)