---------- Forwarded message ----------
From: Simone Frenzel <[email protected]>
Date: 2011/8/22
Subject: Patch für httpResponse
To: [email protected]


Hi,

tested nutch on differnt webpages. In case of a short ziped pages  it thrwos
an IO_Exception:
java.io.IOException: unzipBestEffort returned null
2011-08-19 17:06:55,190 ERROR httpclient.Http - at
org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:310)
2011-08-19 17:06:55,191 ERROR httpclient.Http - at
org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:163)
2011-08-19 17:06:55,191 ERROR httpclient.Http - at
org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
2011-08-19 17:06:55,191 ERROR httpclient.Http - at
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:138)
2011-08-19 17:06:55,191 ERROR httpclient.Http - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)

 a  little change on HttpResponse solve the problem - now there is no
problem with zipped Pages, BaiscAuth and Zipped Pages ... anymore.

Patch is attched.

Greetings and thanks
Index: trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java
===================================================================
--- trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java	(Revision 1160266)
+++ trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java	(Arbeitskopie)
@@ -124,7 +124,7 @@
         int totalRead = 0;
         ByteArrayOutputStream out = new ByteArrayOutputStream();
         while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1
-            && totalRead + bufferFilled < contentLength) {
+            && totalRead + bufferFilled <= contentLength) {
           totalRead += bufferFilled;
           out.write(buffer, 0, bufferFilled);
         }

Reply via email to