Remove double slashes
---------------------

                 Key: NUTCH-1011
                 URL: https://issues.apache.org/jira/browse/NUTCH-1011
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 1.4, 2.0
            Reporter: Markus Jelsma
            Assignee: Markus Jelsma
            Priority: Minor


Many websites produce faulty URL's with multiple slashes e.g. 
http://cocoon.apache.org///////////////////////1.x/dynamic.html
This can be really nasty if the number of slashes varies, resulting in many 
URL's actually pointing to the same page and generating new (unique) URL's to 
the same or other duplicate pages.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to