Gerard Bouchar created NUTCH-2564: ------------------------------------- Summary: protocol-http throws an error when the content-length header is not a number Key: NUTCH-2564 URL: https://issues.apache.org/jira/browse/NUTCH-2564 Project: Nutch Issue Type: Sub-task Reporter: Gerard Bouchar
When a server sends an invalid Content-Length header (one that is not a valid number) with a plain-text http body, browsers simply ignore it, but protocol-http has a strange approach: if the header is composed only of white spaces, it ignores it, but if it contains other characters, it throws an error, preventing us from doing anything with the page. If the HTTP body is chunked, protocol-http always ignores the Content-Length header, be it invalid or not. It should simply ignore invalid Content-Length headers. Relevant code: [https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L354-L359] -- This message was sent by Atlassian JIRA (v7.6.3#76005)