hey my crawler is giving a java.io.IOException after like 40-50 mins of crawl.. Were you guys facing this issue??
On Sun, Feb 15, 2015 at 10:10 AM, Renxia Wang <[email protected]> wrote: > Hi all, > > I am running Nutch on may own laptop and I'd like to set a limit for the > (ftp|http).content.size so that the crawl will not be downloading huge file > for a long time and possibly cause java heap size issue. However, I wonder > if downloading the files(especially those compressed file, like zip, rar, > etc) partially can fail the parsing and deduplication processing, as the > file is incomplete? > > Thanks, > > Renxia >

