[ 
https://issues.apache.org/jira/browse/NUTCH-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792070#comment-16792070
 ] 

ASF GitHub Bot commented on NUTCH-2699:
---------------------------------------

sebastian-nagel commented on pull request #445: NUTCH-2699 Protocol-okhttp: 
needless loops to increment requested bytes counter when more content is 
already buffered
URL: https://github.com/apache/nutch/pull/445
 
 
   Significantly less loops are now executed:
   ```
   2019-03-13 17:46:25,232 DEBUG okhttp.OkHttpResponse - 
http://localhost/large.pdf - http/1.1 200 OK
   2019-03-13 17:46:25,233 DEBUG okhttp.OkHttpResponse - total bytes requested 
= 8192, buffered = 16088
   2019-03-13 17:46:25,233 DEBUG okhttp.OkHttpResponse - total bytes requested 
= 24280, buffered = 24280
   2019-03-13 17:46:25,233 DEBUG okhttp.OkHttpResponse - total bytes requested 
= 32472, buffered = 32472
   2019-03-13 17:46:25,233 DEBUG okhttp.OkHttpResponse - total bytes requested 
= 40664, buffered = 40664
   2019-03-13 17:46:25,233 DEBUG okhttp.OkHttpResponse - total bytes requested 
= 48856, buffered = 48856
   2019-03-13 17:46:25,233 DEBUG okhttp.OkHttpResponse - total bytes requested 
= 57048, buffered = 57048
   2019-03-13 17:46:25,233 DEBUG okhttp.OkHttpResponse - total bytes requested 
= 65240, buffered = 65240
   2019-03-13 17:46:25,233 DEBUG okhttp.OkHttpResponse - total bytes requested 
= 65534, buffered = 73432
   2019-03-13 17:46:25,233 DEBUG okhttp.OkHttpResponse - content limit reached
   2019-03-13 17:46:25,233 DEBUG okhttp.OkHttpResponse - copied 65534 bytes out 
of 73432 buffered, remaining 7898 bytes in buffer
   2019-03-13 17:46:25,234 DEBUG okhttp.OkHttpResponse - HTTP content truncated 
to 65534 bytes (reason: LENGTH)
   2019-03-13 17:46:25,249 INFO  parse.ParseSegment - 
http://localhost/large.pdf skipped. Content of size 366578 was truncated to 
65534
   2019-03-13 17:46:25,249 WARN  parse.ParserChecker - Content is truncated, 
parse may fail!
   ```
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Protocol-okhttp: needless loops to increment requested bytes counter when 
> more content is already buffered
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-2699
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2699
>             Project: Nutch
>          Issue Type: Bug
>          Components: protocol
>    Affects Versions: 1.15
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.16
>
>
> The okhttp library used by the plugin protocol-okhttp buffers content 
> internal and often has already buffered more content than has been requested. 
> The plugin should immediately set the request count to the size of the 
> buffered content to avoid needless loops when the buffered size comes close 
> to the content limit (the increment steps are too small):
> {noformat}
> 2019-03-11 14:56:36,642 DEBUG okhttp.OkHttpResponse - 
> http://localhost/large.pdf - http/1.1 200 OK
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 8192, buffered = 16088
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 16384, buffered = 24280
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 24576, buffered = 32472
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 32768, buffered = 40664
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 40960, buffered = 48856
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 49152, buffered = 57048
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 57344, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 57638, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 57932, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 58226, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 58520, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 58814, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 59108, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 59402, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 59696, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 59990, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 60284, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 60578, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 60872, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 61166, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 61460, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 61754, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 62048, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 62342, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 62636, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 62930, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 63224, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 63518, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 63812, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 64106, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 64400, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 64694, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 64988, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 65282, buffered = 73432
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - content limit reached
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - copied 65534 bytes out 
> of 73432 buffered, remaining buffer contains 7898 bytes
> 2019-03-11 14:56:36,645 DEBUG okhttp.OkHttpResponse - HTTP content truncated 
> to 65534 bytes (reason: LENGTH)
> 2019-03-11 14:56:36,661 INFO parse.ParseSegment - http://localhost/large.pdf 
> skipped. Content of size 366578 was truncated to 65534
> 2019-03-11 14:56:36,661 WARN parse.ParserChecker - Content is truncated, 
> parse may fail!
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to