[ 
https://issues.apache.org/jira/browse/NUTCH-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471136#comment-16471136
 ] 

Sebastian Nagel commented on NUTCH-2562:
----------------------------------------

Confirmed and reproduced. The reason why the remaining chunks are continued was 
obviously to read the optional trailing headers. But you're right: better stop 
and skip the trailing headers (if any) with the remaining content.

> protocol-http fails to read large chunked HTTP responses
> --------------------------------------------------------
>
>                 Key: NUTCH-2562
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2562
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Gerard Bouchar
>            Priority: Major
>
> While reading chunked content, if the content size becomes larger than 
> http.getMaxContent(), instead of just stopping and truncate the content, it 
> tries to read a new chunk before having read the previous one completely, 
> resulting in a '{color:#333333}bad chunk length' error.{color}
>  
> {color:#333333}See: 
> https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L440-L442{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to