[
https://issues.apache.org/jira/browse/NUTCH-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951956#comment-16951956
]
Sebastian Nagel commented on NUTCH-2746:
----------------------------------------
PR open. Of course, there are methods provided in URLUtil but these are not
used in any of the URL normalizers. The patch tries to minimize the efforts and
only does the IDN conversion if necessary. BasicURLNormalizer already operates
with parts of the URL (host, path, query) which obsoletes additional
parsing/splitting of URLs.
> Basic URL normalizer to normalize Unicode domain names
> ------------------------------------------------------
>
> Key: NUTCH-2746
> URL: https://issues.apache.org/jira/browse/NUTCH-2746
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 1.16
> Reporter: Sebastian Nagel
> Priority: Major
> Fix For: 1.17
>
>
> The BasicURLNormalizer (plugin urlnormalizer-basic) lacks the possibility to
> normalize IDNs (Unicode host/domain names).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)