On 2010-05-03 22:58, Emmanuel de Castro Santana wrote:
> "The first method of recovering that is mentioned there works only
> (sometimes) for crawls performed using LocalJobTracker and local file
> system. It does not work in any other case."
> 
> if I stop the crawling process, take the crawled content from the dfs into
> my local disk, do the fix and then put it back into hdfs, would it work ? Or
> would there be a problem about dfs replication of the new files ?

Again, this procedure does NOT work when using HDFS - you won't even see
the partial output (without some serious hacking).

> 
> "Yes, but if the parsing fails you still have the downloaded content,
> which you can re-parse again after you fixed the config or the code..."
> 
> Interesting ... I did not see any option like a -noParsing in the "bin/nutch
> crawl" command, that means I will have to code my own .sh for crawling, one
> that uses the -noparsing option of the fetcher right ?

You can simply set the fetcher.parsing config option to false.


-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to