[ 
https://issues.apache.org/jira/browse/NUTCH-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark DeSpain updated NUTCH-620:
-------------------------------

    Attachment: patch.txt

Here is a patch with updated BasicURLNormalizer such that it will collapse 
adjacent slashes.  It also updates the corresponding unit test.


> BasicURLNormalizer should collapse runs of slashes with a single slash
> ----------------------------------------------------------------------
>
>                 Key: NUTCH-620
>                 URL: https://issues.apache.org/jira/browse/NUTCH-620
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.9.0
>         Environment: JDK 1.6 update 5, Tomcat 6, Windows Server 2003, 
>            Reporter: Mark DeSpain
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The BasicURLNormalizer should collapse runs of slash characters '/' with a 
> single slash.  
> For example,  the following URLs should be normalized to 
> http://lucene.apache.org/nutch/about.html
> * http://lucene.apache.org/nutch//about.html 
> * http://lucene.apache.org//nutch/about.html 
> * http://lucene.apache.org/////nutch////about.html (an exaggerated example)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to