[ https://issues.apache.org/jira/browse/NUTCH-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13854267#comment-13854267 ]
İlhami KALKAN commented on NUTCH-1321: -------------------------------------- Hi Sebastian, 1-)This code block is belongs to old patch version, Nutch-1321.patch. Sorry about was not removing it. New version of isPunycode(url) exist in idnNormalizer.patch. 2-)This patch revert only url which is punycoded to unicode while indexing. 'id' is not reverted to unicode. Holding punycoded value while indexing. Is this enough for updating and deleting indexed documents or If we need to punycoded url, can you explain a little more why we need this? > IDNNormalizer > ------------- > > Key: NUTCH-1321 > URL: https://issues.apache.org/jira/browse/NUTCH-1321 > Project: Nutch > Issue Type: New Feature > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Fix For: 1.9 > > Attachments: idnNormalizer.patch > > > Right now, IDN's are indexed as ASCII. An IDNNormalizer is to be used with an > indexer so it will encode ASCII URL's to their proper unicode equivalant. -- This message was sent by Atlassian JIRA (v6.1.4#6159)