Sebastian Nagel created NUTCH-2705:
--------------------------------------

             Summary: urlfilter-validator rejects IPv6 URLs
                 Key: NUTCH-2705
                 URL: https://issues.apache.org/jira/browse/NUTCH-2705
             Project: Nutch
          Issue Type: Bug
          Components: plugin
    Affects Versions: 1.15
            Reporter: Sebastian Nagel
             Fix For: 1.16


The plugin urlfilter-validator rejects URLs with an IPv6 address as 
hostname/authority (given according to [RFC 
2732|https://tools.ietf.org/html/rfc2732]:
{noformat}
% echo "http://[2010:836B:4179::836B:4179]/"; \
    | bin/nutch filterchecker -filterName urlfilter-validator -stdin
Checking combination of these URLFilters: UrlValidator 
-http://[2010:836B:4179::836B:4179]/
{noformat}

We should also consider to use the class 
[UrlValidator|https://commons.apache.org/proper/commons-validator/apidocs/org/apache/commons/validator/routines/UrlValidator.html]
 from commons-validator directly instead of a modified copy. This would help to 
get updates and improvements with little effort - IPv6 is already supported, 
see the [class 
implementation|https://commons.apache.org/proper/commons-validator/apidocs/src-html/org/apache/commons/validator/routines/UrlValidator.html#line.380].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to