Re: Does Limiting the (ftp|http).content.size Affect the Parsing and Deduplication?

Siddharth Mahendra Dasani Mon, 16 Feb 2015 13:36:12 -0800

hey my crawler is giving a java.io.IOException after like 40-50 mins of
crawl.. Were you guys facing this issue??


On Sun, Feb 15, 2015 at 10:10 AM, Renxia Wang <[email protected]> wrote:

> Hi all,
>
> I am running Nutch on may own laptop and I'd like to set a limit for the
> (ftp|http).content.size so that the crawl will not be downloading huge file
> for a long time and possibly cause java heap size issue. However, I wonder
> if downloading the files(especially those compressed file, like zip, rar,
> etc) partially can fail the parsing and deduplication processing, as the
> file is incomplete?
>
> Thanks,
>
> Renxia
>

Re: Does Limiting the (ftp|http).content.size Affect the Parsing and Deduplication?

Reply via email to