[ 
https://issues.apache.org/jira/browse/NUTCH-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerard Bouchar updated NUTCH-2555:
----------------------------------
    Description: 
When an URL does not have a path but has GET parameters (for instance 
'[http://example.com?a=1')|http://example.com/?a=1%27)] it should be normalized 
to add a '/' at the beginning of the path (giving 
[http://example.com/?a=1|http://example.com/?a=1%27)]). Our logs show that 
non-normalized URLs reach protocol-http, which then tries to send an invalid 
HTTP request:

GET ?a=1 HTTP/1.0

instead of

GET /?a=1 HTTP/1.0

 

Example URL for which this poses a problem: 
[http://news.fx678.com?171|http://news.fx678.com/?171]

  was:
When an URL does not have a path but has GET parameters (for instance 
'[http://example.com?a=1')|http://example.com/?a=1%27)] it should be normalized 
to add a '/' at the beginning of the path (giving 
[http://example.com/?a=1|http://example.com/?a=1%27)]). Our logs show that 
non-normalized URLs reach protocol-http, which then tries to send an invalid 
HTTP request:

GET ?a=1 HTTP/1.0

instead of

GET /?a=1 HTTP/1.0


> URL normalization problem: path not starting with a '/'
> -------------------------------------------------------
>
>                 Key: NUTCH-2555
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2555
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Gerard Bouchar
>            Priority: Major
>
> When an URL does not have a path but has GET parameters (for instance 
> '[http://example.com?a=1')|http://example.com/?a=1%27)] it should be 
> normalized to add a '/' at the beginning of the path (giving 
> [http://example.com/?a=1|http://example.com/?a=1%27)]). Our logs show that 
> non-normalized URLs reach protocol-http, which then tries to send an invalid 
> HTTP request:
> GET ?a=1 HTTP/1.0
> instead of
> GET /?a=1 HTTP/1.0
>  
> Example URL for which this poses a problem: 
> [http://news.fx678.com?171|http://news.fx678.com/?171]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to