Gerard Bouchar created NUTCH-2555: ------------------------------------- Summary: URL normalization problem: path not starting with a '/' Key: NUTCH-2555 URL: https://issues.apache.org/jira/browse/NUTCH-2555 Project: Nutch Issue Type: Sub-task Reporter: Gerard Bouchar
When an URL does not have a path but has GET parameters (for instance '[http://example.com?a=1')|http://example.com/?a=1%27)] it should be normalized to add a '/' at the beginning of the path (giving [http://example.com/?a=1|http://example.com/?a=1%27)]). Our logs show that non-normalized URLs reach protocol-http, which then tries to send an invalid HTTP request: GET ?a=1 HTTP/1.0 instead of GET /?a=1 HTTP/1.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)