Gerard Bouchar created NUTCH-2575:
-------------------------------------

             Summary: protocol-http does not respect the maximum content-size
                 Key: NUTCH-2575
                 URL: https://issues.apache.org/jira/browse/NUTCH-2575
             Project: Nutch
          Issue Type: Sub-task
            Reporter: Gerard Bouchar


There is a bug in HttpResponse::readChunkedContent that prevents it to stop 
reading content when it exceeds the maximum allowed size.

There [is a variable 
contentBytesRead|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L404]
 that is used to check how much content has been read, but it is never updated, 
so it always stays null, and [the size 
check|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L440-L442]
 always returns false (unless a single chunk is larger than the maximum allowed 
content size).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to