[ 
https://issues.apache.org/jira/browse/NUTCH-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2548.
------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.4

Thanks, [~rustyx]! Applied patch / merged PR (sorry, I've applied the patch 
first, missed the PR).

> Compressed content skipped. Content of size 78 was truncated to 74
> ------------------------------------------------------------------
>
>                 Key: NUTCH-2548
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2548
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 2.4
>            Reporter: Rustam
>            Priority: Major
>             Fix For: 2.4
>
>         Attachments: nutch-content-truncated.patch
>
>
> gzip or deflate compressed content fails to parse with a message like:
> {{WARN  parse.ParserJob - https://rustyx.org/temp/index%20bbb skipped. 
> Content of size 78 was truncated to 74}}
> The root cause is that the original (compressed) Content-Length is stored in 
> the headers, while the content is stored uncompressed. Subsequently the 
> Content-Length doesn't match the stored content size.
> See attached patch that fixed the issue by removing Content-Length from the 
> headers if it contains compressed value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to