Sebastian Nagel created NUTCH-2705:
--------------------------------------
Summary: urlfilter-validator rejects IPv6 URLs
Key: NUTCH-2705
URL: https://issues.apache.org/jira/browse/NUTCH-2705
Project: Nutch
Issue Type: Bug
Components: plugin
Affects Versions: 1.15
Reporter: Sebastian Nagel
Fix For: 1.16
The plugin urlfilter-validator rejects URLs with an IPv6 address as
hostname/authority (given according to [RFC
2732|https://tools.ietf.org/html/rfc2732]:
{noformat}
% echo "http://[2010:836B:4179::836B:4179]/" \
| bin/nutch filterchecker -filterName urlfilter-validator -stdin
Checking combination of these URLFilters: UrlValidator
-http://[2010:836B:4179::836B:4179]/
{noformat}
We should also consider to use the class
[UrlValidator|https://commons.apache.org/proper/commons-validator/apidocs/org/apache/commons/validator/routines/UrlValidator.html]
from commons-validator directly instead of a modified copy. This would help to
get updates and improvements with little effort - IPv6 is already supported,
see the [class
implementation|https://commons.apache.org/proper/commons-validator/apidocs/src-html/org/apache/commons/validator/routines/UrlValidator.html#line.380].
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)