[ 
https://issues.apache.org/jira/browse/NUTCH-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951956#comment-16951956
 ] 

Sebastian Nagel commented on NUTCH-2746:
----------------------------------------

PR open. Of course, there are methods provided in URLUtil but these are not 
used in any of the URL normalizers. The patch tries to minimize the efforts and 
only does the IDN conversion if necessary. BasicURLNormalizer already operates 
with parts of the URL (host, path, query) which obsoletes additional 
parsing/splitting of URLs.

> Basic URL normalizer to normalize Unicode domain names
> ------------------------------------------------------
>
>                 Key: NUTCH-2746
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2746
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.16
>            Reporter: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.17
>
>
> The BasicURLNormalizer (plugin urlnormalizer-basic) lacks the possibility to 
> normalize IDNs (Unicode host/domain names).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to