[ 
https://issues.apache.org/jira/browse/NUTCH-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13854267#comment-13854267
 ] 

İlhami KALKAN commented on NUTCH-1321:
--------------------------------------

Hi Sebastian,
1-)This code block is belongs to old patch version, Nutch-1321.patch. Sorry 
about was not removing it. New version of isPunycode(url) exist in 
idnNormalizer.patch.  
2-)This patch revert only url which is punycoded to unicode while indexing. 
'id' is not reverted to unicode. Holding punycoded value while indexing. 
Is this enough for updating and deleting indexed documents or If we need to 
punycoded url, can you explain a little more why we need this?

> IDNNormalizer
> -------------
>
>                 Key: NUTCH-1321
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1321
>             Project: Nutch
>          Issue Type: New Feature
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.9
>
>         Attachments: idnNormalizer.patch
>
>
> Right now, IDN's are indexed as ASCII. An IDNNormalizer is to be used with an 
> indexer so it will encode ASCII URL's to their proper unicode equivalant.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to