[ 
https://issues.apache.org/jira/browse/NUTCH-560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530519
 ] 

Susam Pal commented on NUTCH-560:
---------------------------------

I analysed 'protocol-http' and it behaves almost in the same manner. While 
buffering, we can not stop reading after exactly 'http.content.limit' bytes 
have been read. It would be one iteration after the limit, when the limit check 
tells that we have exceeded the limit. So, this doesn't seem like a bug. 
However, it doesn't take care of reading till 'Content-Length' bytes, which 
NUTCH-559 is doing.

> protocol-httpclient reading more bytes than http.content.limit
> --------------------------------------------------------------
>
>                 Key: NUTCH-560
>                 URL: https://issues.apache.org/jira/browse/NUTCH-560
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.9.0, 1.0.0
>            Reporter: Joseph M.
>
> I modified protocol-httpclient HttpResponse.java to download files to file 
> system. If I set http.content.limit to 5000... it fetches around 5500 to 6000 
> bytes instead and downloads it to file system. There is calculation mistake 
> in calculateTryToRead() function.
> {code}
>         int tryAndRead = calculateTryToRead(totalRead);
>         while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 && 
> tryAndRead > 0) {
>           totalRead += bufferFilled;
>           out.write(buffer, 0, bufferFilled);
>           tryAndRead = calculateTryToRead(totalRead);
>         }{code}
> while loop stops when calculateTryToRead() returns -ve or 0.
>   {code}private int calculateTryToRead(int totalRead) {
>     int tryToRead = Http.BUFFER_SIZE;
>     if (http.getMaxContent() <= 0) {
>       return http.BUFFER_SIZE;
>     } else if (http.getMaxContent() - totalRead < http.BUFFER_SIZE) {
>       tryToRead = http.getMaxContent() - totalRead;
>     }
>     return tryToRead;
>   }{code}
> It is returning -ve when totalRead > http.getMaxContent(). So more bytes than 
> http.content.limit is read before breaking while loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to