Hi Everyone, I have problems while getting summaries from the DFS segments. Sometimes, on random segments and on random DFS blocks, I face with the following error :
java.io.IOException: Could not obtain block: blk_1996629287798238182 file=/data/crawl/segments/20060616121845/parse_text/part-00047/index offset=0 Could this problem somehow be related to my hadoop-site configuration, or could it be related to my nutch version (hadoop-0.4.0 and nutch-2006-07-19)? When I try to get the parse_text of the url with segread script, there is no problem. So I assume it does not related to special URLs.
