protocol-httpclient reading more bytes than http.content.limit
--------------------------------------------------------------
Key: NUTCH-560
URL: https://issues.apache.org/jira/browse/NUTCH-560
Project: Nutch
Issue Type: Bug
Components: fetcher
Affects Versions: 0.9.0, 1.0.0
Reporter: Joseph M.
I modified protocol-httpclient HttpResponse.java to download files to file
system. If I set http.content.limit to 5000... it fetches around 5500 to 6000
bytes instead and downloads it to file system. There is calculation mistake in
calculateTryToRead() function.
{code}
int tryAndRead = calculateTryToRead(totalRead);
while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 &&
tryAndRead > 0) {
totalRead += bufferFilled;
out.write(buffer, 0, bufferFilled);
tryAndRead = calculateTryToRead(totalRead);
}{code}
while loop stops when calculateTryToRead() returns -ve or 0.
{code}private int calculateTryToRead(int totalRead) {
int tryToRead = Http.BUFFER_SIZE;
if (http.getMaxContent() <= 0) {
return http.BUFFER_SIZE;
} else if (http.getMaxContent() - totalRead < http.BUFFER_SIZE) {
tryToRead = http.getMaxContent() - totalRead;
}
return tryToRead;
}{code}
It is returning -ve when totalRead > http.getMaxContent(). So more bytes than
http.content.limit is read before breaking while loop.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.