Hi all,
if (http.getMaxContent() >= 0
&& contentLength > http.getMaxContent()) // limit download size
contentLength = http.getMaxContent();
.......
for (int i = in.read(bytes); i != -1 && length + i <= contentLength; i
= in.read(bytes)) {
out.write(bytes, 0, i);
length += i;
}
So nutch works like that: If "http.content.limit < contentLength" then
truncate the content. Then If isTruncated() is true at ParserJob, do not
parse.
Why the content is read? I think we should pass fetch, If
"http.content.limit < contentLength".
If you think so I can implement patch.