[
https://issues.apache.org/jira/browse/NUTCH-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471706#comment-16471706
]
ASF GitHub Bot commented on NUTCH-2575:
---------------------------------------
Omkar20895 commented on issue #327: NUTCH-2575 Storing total number of bytes
read after every chunk
URL: https://github.com/apache/nutch/pull/327#issuecomment-388318227
Thank you @sebastian-nagel
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> protocol-http does not respect the maximum content-size for chunked responses
> -----------------------------------------------------------------------------
>
> Key: NUTCH-2575
> URL: https://issues.apache.org/jira/browse/NUTCH-2575
> Project: Nutch
> Issue Type: Sub-task
> Components: protocol
> Affects Versions: 1.14
> Reporter: Gerard Bouchar
> Priority: Critical
> Fix For: 1.15
>
>
> There is a bug in HttpResponse::readChunkedContent that prevents it to stop
> reading content when it exceeds the maximum allowed size.
> There [is a variable
> contentBytesRead|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L404]
> that is used to check how much content has been read, but it is never
> updated, so it always stays null, and [the size
> check|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L440-L442]
> always returns false (unless a single chunk is larger than the maximum
> allowed content size).
> This allows any server to cause out-of-memory errors on our size.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)