Sebastian Nagel commented on NUTCH-1344:

Is there any reason why https should be treated different from http (and ftp)?
> BasicURLNormalizer to normalize https same as http 
> ---------------------------------------------------
>                 Key: NUTCH-1344
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1344
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: nutchgora, 1.6
>            Reporter: Sebastian Nagel
>         Attachments: NUTCH-1344.patch
> Most of the normalization done by BasicURLNormalizer (lowercasing host, 
> removing default port, removal of page anchors, cleaning . and . in the path) 
> is not done for URLs with protocol https.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to