Hi Isn't that an effect of
<property> <name>http.content.limit</name> <value>65536</value> <description>The length limit for downloaded content using the http:// protocol, in bytes. If this value is nonnegative (>=0), content longer than it will be truncated; otherwise, no truncation at all. Do not confuse this setting with the file.content.limit setting. </description></property> I can't reproduce the problem as http://search.dangdang.com/ seems to be down. Do you have another URL to illustrate the issue? J. On 16 September 2014 15:59, zeroleaf <[email protected]> wrote: > These days, when I use nutch, I found that if the Transfer Dncoding is > chunked, then nutch will not fetch the whole page and only part of it. Is > it > right in nutch or is it a bug? If it is right, then how to config to fetch > the > whole page? > > For example, add the url below to seed dir > > http://search.dangdang.com/?key=%CA%FD%BE%DD%BF%E2 > > then, find fetched html in content, will find it is only a part. In > addition, the > version I test is Nutch 1.x(1.9 and 1.10). > > Thanks. > -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

