protocol-httpclient reading more bytes than http.content.limit
--------------------------------------------------------------

                 Key: NUTCH-560
                 URL: https://issues.apache.org/jira/browse/NUTCH-560
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 0.9.0, 1.0.0
            Reporter: Joseph M.


I modified protocol-httpclient HttpResponse.java to download files to file 
system. If I set http.content.limit to 5000... it fetches around 5500 to 6000 
bytes instead and downloads it to file system. There is calculation mistake in 
calculateTryToRead() function.

{code}
        int tryAndRead = calculateTryToRead(totalRead);
        while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 && 
tryAndRead > 0) {
          totalRead += bufferFilled;
          out.write(buffer, 0, bufferFilled);
          tryAndRead = calculateTryToRead(totalRead);
        }{code}

while loop stops when calculateTryToRead() returns -ve or 0.

  {code}private int calculateTryToRead(int totalRead) {
    int tryToRead = Http.BUFFER_SIZE;
    if (http.getMaxContent() <= 0) {
      return http.BUFFER_SIZE;
    } else if (http.getMaxContent() - totalRead < http.BUFFER_SIZE) {
      tryToRead = http.getMaxContent() - totalRead;
    }
    return tryToRead;
  }{code}

It is returning -ve when totalRead > http.getMaxContent(). So more bytes than 
http.content.limit is read before breaking while loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to