Hi all, I am running Nutch on may own laptop and I'd like to set a limit for the (ftp|http).content.size so that the crawl will not be downloading huge file for a long time and possibly cause java heap size issue. However, I wonder if downloading the files(especially those compressed file, like zip, rar, etc) partially can fail the parsing and deduplication processing, as the file is incomplete?
Thanks, Renxia

