[
https://issues.apache.org/jira/browse/NUTCH-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2746.
------------------------------------
Resolution: Fixed
Merged/committed. Note: by default the behavior is still the old and neither
IDNs are normalized nor trailing dots in host names are stripped.
> Basic URL normalizer to normalize Unicode domain names
> ------------------------------------------------------
>
> Key: NUTCH-2746
> URL: https://issues.apache.org/jira/browse/NUTCH-2746
> Project: Nutch
> Issue Type: Improvement
> Components: plugin, urlnormalizer
> Affects Versions: 1.16
> Reporter: Sebastian Nagel
> Priority: Major
> Fix For: 1.17
>
>
> The BasicURLNormalizer (plugin urlnormalizer-basic) lacks the possibility to
> normalize IDNs (Unicode host/domain names).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)