Sebastian Nagel created NUTCH-1879:
--------------------------------------

             Summary: Regex URL normalizer should remove multiple slashes after 
file: protocol
                 Key: NUTCH-1879
                 URL: https://issues.apache.org/jira/browse/NUTCH-1879
             Project: Nutch
          Issue Type: Sub-task
          Components: protocol
    Affects Versions: 2.2.1, 1.9
            Reporter: Sebastian Nagel
             Fix For: 2.3, 1.10


urlnormalizer-regex should replace multiple slashes after {{file:}} protocol by 
a single slash ({{file:///}} -> {{file:/}}):
* required by NUTCH-1483 to get a consistent canonical form for file URL 
because URL.toString() also emits the single-slash form
* would obsolete NUTCH-1878



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to