[ 
https://issues.apache.org/jira/browse/NUTCH-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doğacan Güney closed NUTCH-560.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0.0
         Assignee: Doğacan Güney

Fixed as part of NUTCH-559.

> protocol-httpclient reading more bytes than http.content.limit
> --------------------------------------------------------------
>
>                 Key: NUTCH-560
>                 URL: https://issues.apache.org/jira/browse/NUTCH-560
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.9.0, 1.0.0
>            Reporter: Joseph M.
>            Assignee: Doğacan Güney
>             Fix For: 1.0.0
>
>
> I modified protocol-httpclient HttpResponse.java to download files to file 
> system. If I set http.content.limit to 5000... it fetches around 5500 to 6000 
> bytes instead and downloads it to file system. There is calculation mistake 
> in calculateTryToRead() function.
> {code}
>         int tryAndRead = calculateTryToRead(totalRead);
>         while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 && 
> tryAndRead > 0) {
>           totalRead += bufferFilled;
>           out.write(buffer, 0, bufferFilled);
>           tryAndRead = calculateTryToRead(totalRead);
>         }{code}
> while loop stops when calculateTryToRead() returns -ve or 0.
>   {code}private int calculateTryToRead(int totalRead) {
>     int tryToRead = Http.BUFFER_SIZE;
>     if (http.getMaxContent() <= 0) {
>       return http.BUFFER_SIZE;
>     } else if (http.getMaxContent() - totalRead < http.BUFFER_SIZE) {
>       tryToRead = http.getMaxContent() - totalRead;
>     }
>     return tryToRead;
>   }{code}
> It is returning -ve when totalRead > http.getMaxContent(). So more bytes than 
> http.content.limit is read before breaking while loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to