[ 
https://issues.apache.org/jira/browse/NUTCH-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920360#comment-16920360
 ] 

Hudson commented on NUTCH-2729:
-------------------------------

SUCCESS: Integrated in Jenkins build Nutch-trunk #3639 (See 
[https://builds.apache.org/job/Nutch-trunk/3639/])
NUTCH-2729 protocol-okhttp: fix marking of truncated content (snagel: 
[https://github.com/apache/nutch/commit/a82a663881fc3bb05c6e8bf6bd80fae22e36a069])
* (edit) 
src/plugin/protocol-okhttp/src/java/org/apache/nutch/protocol/okhttp/OkHttpResponse.java
* (edit) 
src/plugin/protocol-okhttp/src/test/org/apache/nutch/protocol/okhttp/TestBadServerResponses.java
NUTCH-2729 protocol-okhttp: fix marking of truncated content - avoid (snagel: 
[https://github.com/apache/nutch/commit/5c45172bc86f30e24c6dfe126b10274cd9f799bf])
* (edit) 
src/plugin/protocol-okhttp/src/test/org/apache/nutch/protocol/okhttp/TestBadServerResponses.java
* (edit) 
src/plugin/protocol-okhttp/src/java/org/apache/nutch/protocol/okhttp/OkHttpResponse.java
NUTCH-2729 protocol-okhttp: fix marking of truncated content - log (snagel: 
[https://github.com/apache/nutch/commit/efcafb67b43938f70354ca57e816c708b92815fe])
* (edit) 
src/plugin/protocol-okhttp/src/test/org/apache/nutch/protocol/okhttp/TestBadServerResponses.java
* (edit) 
src/plugin/protocol-okhttp/src/java/org/apache/nutch/protocol/okhttp/OkHttpResponse.java


> protocol-okhttp: fix marking of truncated content
> -------------------------------------------------
>
>                 Key: NUTCH-2729
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2729
>             Project: Nutch
>          Issue Type: Bug
>          Components: plugin, protocol
>    Affects Versions: 1.15
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.16
>
>
> The plugin protocol-okhttp marks content as "truncated" including the reason 
> for the truncation - content limit or time limit exceeded, network disconnect 
> during fetch.
> The detection of truncation by content limit has one bug: if the fetched 
> content is exactly the size of the content limit the loop to request more 
> content is exited. It should be continued by requesting one byte more to 
> reliably detect whether content is truncated or not.
> Note that the Content-Length header cannot be used to determine truncation 
> reliably: it does not indicate the real content length for compressed or 
> chunked content or it might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to