[
https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936103#comment-13936103
]
ysc commented on NUTCH-1736:
----------------------------
problem:
fetching:
http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
Fetch failed with protocol status: EXCEPTION: java.io.IOException:
unzipBestEffort returned null
detail:
2014-03-12 16:48:38,031 ERROR http.Http - Failed to get protocol output
java.io.IOException: unzipBestEffort returned null
at
org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:317)
at org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:164)
at org.apache.nutch.protocol.http.Http.getResponse(Http.java:64)
at
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:140)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:703)
2014-03-12 16:48:38,031 INFO fetcher.Fetcher - fetch of
http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
failed with: java.io.IOException: unzipBestEffort returned null
2014-03-12 16:48:38,031 INFO fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=0
solution:
this patch deal with http response header Transfer-Encoding:chunked
important tips:
property http.content.limit in nutch-site.xml must greater than 0
> can't fetch page if http response header contains Transfer-Encoding:chunked
> ---------------------------------------------------------------------------
>
> Key: NUTCH-1736
> URL: https://issues.apache.org/jira/browse/NUTCH-1736
> Project: Nutch
> Issue Type: Bug
> Components: protocol
> Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1
> Reporter: ysc
> Priority: Critical
> Attachments: nutch-2.2.1.patch, nutch1.7.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> fetching:
> http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
> Fetch failed with protocol status: EXCEPTION: java.io.IOException:
> unzipBestEffort returned null
--
This message was sent by Atlassian JIRA
(v6.2#6252)