Remove double slashes
---------------------
Key: NUTCH-1011
URL: https://issues.apache.org/jira/browse/NUTCH-1011
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.4, 2.0
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
Many websites produce faulty URL's with multiple slashes e.g.
http://cocoon.apache.org///////////////////////1.x/dynamic.html
This can be really nasty if the number of slashes varies, resulting in many
URL's actually pointing to the same page and generating new (unique) URL's to
the same or other duplicate pages.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira