On 2010-05-03 22:58, Emmanuel de Castro Santana wrote: > "The first method of recovering that is mentioned there works only > (sometimes) for crawls performed using LocalJobTracker and local file > system. It does not work in any other case." > > if I stop the crawling process, take the crawled content from the dfs into > my local disk, do the fix and then put it back into hdfs, would it work ? Or > would there be a problem about dfs replication of the new files ?
Again, this procedure does NOT work when using HDFS - you won't even see the partial output (without some serious hacking). > > "Yes, but if the parsing fails you still have the downloaded content, > which you can re-parse again after you fixed the config or the code..." > > Interesting ... I did not see any option like a -noParsing in the "bin/nutch > crawl" command, that means I will have to code my own .sh for crawling, one > that uses the -noparsing option of the fetcher right ? You can simply set the fetcher.parsing config option to false. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com