[ https://issues.apache.org/jira/browse/NUTCH-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509865#comment-16509865 ]
Hudson commented on NUTCH-2555: ------------------------------- SUCCESS: Integrated in Jenkins build Nutch-trunk #3534 (See [https://builds.apache.org/job/Nutch-trunk/3534/]) NUTCH-2555 URL normalization problem: path not starting with a '/' For (snagel: [https://github.com/apache/nutch/commit/6239655b6fd959b637ae3948f616f393aa99f159]) * (edit) src/plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java * (edit) src/plugin/protocol-http/src/test/org/apache/nutch/protocol/http/TestBadServerResponses.java * (edit) src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java * (edit) src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java > URL normalization problem: path not starting with a '/' > ------------------------------------------------------- > > Key: NUTCH-2555 > URL: https://issues.apache.org/jira/browse/NUTCH-2555 > Project: Nutch > Issue Type: Sub-task > Affects Versions: 1.14 > Reporter: Gerard Bouchar > Priority: Major > Fix For: 1.15 > > > When an URL does not have a path but has GET parameters (for instance > '[http://example.com?a=1')|http://example.com/?a=1%27)] it should be > normalized to add a '/' at the beginning of the path (giving > [http://example.com/?a=1|http://example.com/?a=1%27)]). Our logs show that > non-normalized URLs reach protocol-http, which then uses URL::getFile() to > get the path, and tries to send an invalid HTTP request: > GET ?a=1 HTTP/1.0 > instead of > GET /?a=1 HTTP/1.0 > > Example URL for which this poses a problem: > [http://news.fx678.com?171|http://news.fx678.com/?171] -- This message was sent by Atlassian JIRA (v7.6.3#76005)