Gerard Bouchar created NUTCH-2564:
-------------------------------------

             Summary: protocol-http throws an error when the content-length 
header is not a number
                 Key: NUTCH-2564
                 URL: https://issues.apache.org/jira/browse/NUTCH-2564
             Project: Nutch
          Issue Type: Sub-task
            Reporter: Gerard Bouchar


When a server sends an invalid Content-Length header (one that is not a valid 
number) with a plain-text http body, browsers simply ignore it, but 
protocol-http has a strange approach: if the header is composed only of white 
spaces, it ignores it, but if it contains other characters, it throws an error, 
preventing us from doing anything with the page.

 

If the HTTP body is chunked, protocol-http always ignores the Content-Length 
header, be it invalid or not.

 

It should simply ignore invalid Content-Length headers.

 

Relevant code: 
[https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L354-L359]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to