On 2010-05-03 19:59, Emmanuel de Castro Santana wrote:
> "Unfortunately, no. You should at least crawl without parsing, so that
> when you download the content you can run the parsing separately, and
> repeat it if it fails."
> 
> I've just found this in the FAQ, can it be done ?
> 
> http://wiki.apache.org/nutch/FAQ#How_can_I_recover_an_aborted_fetch_process.3F

The first method of recovering that is mentioned there works only
(sometimes) for crawls performed using LocalJobTracker and local file
system. It does not work in any other case.

> 
> By the way, about not parsing, isn't necessary to parse the content anyway
> in order to generate links for the next segment ? If this is true, one would
> have to run parse separatedly, which would result the same.

Yes, but if the parsing fails you still have the downloaded content,
which you can re-parse again after you fixed the config or the code...


-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to