[
https://issues.apache.org/jira/browse/NUTCH-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Edward Drapkin updated NUTCH-1112:
----------------------------------
Description:
This line of code is in
protocol-httpclient/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java:
while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 && totalRead +
bufferFilled < contentLength) {
...
}
When the entire content length is less than the size of the buffer, the entire
content will be read into the buffer (and bufferFilled == contentLength) and
the HttpResponse object here will have empty content; similarly, the last
buffer (up to BUFFER_SIZE) will be skipped. This simply needs to be changed to
`totalRead + bufferFilled <= contentLength`.
Thanks!
was:
This line of code is in
protocol-httpclient/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java:
while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 && totalRead +
bufferFilled < contentLength) {
...
}
When the entire content length is less than the size of the buffer, the entire
content will be read into the buffer (and bufferFilled == contentLength) and
the HttpResponse object here will have empty content. This simply needs to be
changed to `totalRead + bufferFilled <= contentLength`.
Thanks!
Summary: off-by-one error in protocol-httpclient; truncates up to
HttpBase.BUFFER_SIZE content (was: protocol-httpclient doesn't accept content
when all of it fits in the buffer at once)
> off-by-one error in protocol-httpclient; truncates up to HttpBase.BUFFER_SIZE
> content
> -------------------------------------------------------------------------------------
>
> Key: NUTCH-1112
> URL: https://issues.apache.org/jira/browse/NUTCH-1112
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.3
> Reporter: Edward Drapkin
>
> This line of code is in
> protocol-httpclient/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java:
> while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 && totalRead
> + bufferFilled < contentLength) {
> ...
> }
> When the entire content length is less than the size of the buffer, the
> entire content will be read into the buffer (and bufferFilled ==
> contentLength) and the HttpResponse object here will have empty content;
> similarly, the last buffer (up to BUFFER_SIZE) will be skipped. This simply
> needs to be changed to `totalRead + bufferFilled <= contentLength`.
> Thanks!
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira