Sebastian Nagel created NUTCH-2729:

             Summary: protocol-okhttp: fix marking of truncated content
                 Key: NUTCH-2729
             Project: Nutch
          Issue Type: Bug
          Components: plugin, protocol
    Affects Versions: 1.15
            Reporter: Sebastian Nagel
             Fix For: 1.16

The plugin protocol-okhttp marks content as "truncated" including the reason 
for the truncation - content limit or time limit exceeded, network disconnect 
during fetch.

The detection of truncation by content limit has one bug: if the fetched 
content is exactly the size of the content limit the loop to request more 
content is exited. It should be continued by requesting one byte more to 
reliably detect whether content is truncated or not.

Note that the Content-Length header cannot be used to determine truncation 
reliably: it does not indicate the real content length for compressed or 
chunked content or it might be wrong.

This message was sent by Atlassian JIRA

Reply via email to