Sebastian Nagel created NUTCH-2729:
--------------------------------------
Summary: protocol-okhttp: fix marking of truncated content
Key: NUTCH-2729
URL: https://issues.apache.org/jira/browse/NUTCH-2729
Project: Nutch
Issue Type: Bug
Components: plugin, protocol
Affects Versions: 1.15
Reporter: Sebastian Nagel
Fix For: 1.16
The plugin protocol-okhttp marks content as "truncated" including the reason
for the truncation - content limit or time limit exceeded, network disconnect
during fetch.
The detection of truncation by content limit has one bug: if the fetched
content is exactly the size of the content limit the loop to request more
content is exited. It should be continued by requesting one byte more to
reliably detect whether content is truncated or not.
Note that the Content-Length header cannot be used to determine truncation
reliably: it does not indicate the real content length for compressed or
chunked content or it might be wrong.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)