[
https://issues.apache.org/jira/browse/NUTCH-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086900#comment-18086900
]
Hudson commented on NUTCH-3176:
-------------------------------
SUCCESS: Integrated in Jenkins build Nutch » Nutch-trunk #240 (See
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/240/])
NUTCH-3176 URLUtil and urlnormalizer-basic: add support for IDNA2008 (snagel:
[https://github.com/apache/nutch/commit/54efa9f0c2f153239e1c48099d1289c353a7f5ca])
* (edit) conf/nutch-default.xml
* (edit) ivy/ivy.xml
* (edit)
src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java
* (edit) src/java/org/apache/nutch/util/URLUtil.java
* (edit)
src/plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java
* (edit) src/test/org/apache/nutch/util/TestURLUtil.java
NUTCH-3176 URLUtil and urlnormalizer-basic: add support for IDNA2008 (snagel:
[https://github.com/apache/nutch/commit/66f55e1e427e400de5ec1d84f7332b241cfc3c61])
* (edit) src/java/org/apache/nutch/util/URLUtil.java
NUTCH-3176 URLUtil and urlnormalizer-basic: add support for IDNA2008 (snagel:
[https://github.com/apache/nutch/commit/947cd288f5b301a55ec6d502080d5fc28dc82a62])
* (edit)
src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java
NUTCH-3176 URLUtil and urlnormalizer-basic: add support for IDNA2008 (snagel:
[https://github.com/apache/nutch/commit/84699a62c21ab8df6b78429ac325cd84c985a01d])
* (edit) src/test/org/apache/nutch/util/TestURLUtil.java
* (edit)
src/plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java
> URLUtil and urlnormalizer-basic: add support for IDNA2008
> ---------------------------------------------------------
>
> Key: NUTCH-3176
> URL: https://issues.apache.org/jira/browse/NUTCH-3176
> Project: Nutch
> Issue Type: New Feature
> Components: plugin, urlnormalizer, util
> Affects Versions: 1.22
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Major
> Fix For: 1.23
>
>
> IDNA2008, defined in [RFC 5890|https://www.rfc-editor.org/rfc/rfc5890], has
> superceded IDNA2003 ([RFC 3490|https://www.rfc-editor.org/rfc/rfc3490]) in
> 2008 (as the name suggests).
> When processing URLs and host names, IDNA2008 variants nowadays occur from
> time to time, causing issues if they fail to be processed. Corresponding
> Nutch tools, that is URLUtil and urlnormalizer-basic, should support IDNA2008.
> IDNA2008 allows Unicode characters from versions newer to Unicode 3.2. There
> are also some deviations in the mapping between Unicode and ASCII. For
> example {{straße.de}} is mapped to {{strasse.de}} by IDNA2003 (an
> irreversible mapping), but to {{xn--strae-oqa.de}} by IDNA2008 (reversibel).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)