some of Deflate encoded pages not fetched
-----------------------------------------
Key: NUTCH-1270
URL: https://issues.apache.org/jira/browse/NUTCH-1270
Project: Nutch
Issue Type: Bug
Components: fetcher
Affects Versions: 1.4
Environment: software
Reporter: behnam nikbakht
it is a problem with some of web pages that fetched but their content can not
retrived
after this change, this error fixed
we change lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
public byte[] processDeflateEncoded(byte[] compressed, URL url) throws
IOException {
if (LOGGER.isTraceEnabled()) { LOGGER.trace("inflating...."); }
byte[] content = DeflateUtils.inflateBestEffort(compressed,
getMaxContent());
+ if(content==null)
+ content = DeflateUtils.inflateBestEffort(compressed, 200000);
if (content == null)
throw new IOException("inflateBestEffort returned null");
if (LOGGER.isTraceEnabled()) {
LOGGER.trace("fetched " + compressed.length
+ " bytes of compressed content (expanded to "
+ content.length + " bytes) from " + url);
}
return content;
}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira