Re: Nutch won't fetch the whole page if the Transfer Dncoding is chunked

Julien Nioche Wed, 17 Sep 2014 02:33:49 -0700

Hi

Isn't that an effect of

<property> <name>http.content.limit</name> <value>65536</value>
<description>The length limit for downloaded content using the http://
protocol, in bytes. If this value is nonnegative (>=0), content longer than
it will be truncated; otherwise, no truncation at all. Do not confuse this
setting with the file.content.limit setting. </description></property>

I can't reproduce the problem as http://search.dangdang.com/ seems to be
down.

Do you have another URL to illustrate the issue?

J.

On 16 September 2014 15:59, zeroleaf <[email protected]> wrote:

>     These days, when I use nutch, I found that if the Transfer Dncoding is
> chunked, then nutch will not fetch the whole page and only part of it. Is
> it
> right in nutch or is it a bug? If it is right, then how to config to fetch
> the
> whole page?
>
> For example, add the url below to seed dir
>
> http://search.dangdang.com/?key=%CA%FD%BE%DD%BF%E2
>
> then, find fetched html in content, will find it is only a part. In
> addition, the
> version I test is Nutch 1.x(1.9 and 1.10).
>
> Thanks.
>

-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Nutch won't fetch the whole page if the Transfer Dncoding is chunked

Reply via email to