Hi,
I noticed that nutch doesn't handle cleaning up (removing temp folders) in
case of error.
In the following classes temp directories are created but not removed when
there is an error:
1. Injector
2. CrawlDBReader
3. Deduplication
4. SegmentReader

For example in injector you find:
RunningJob mapJob = JobClient.runJob(sortJob);

which is not encapsulated in a try catch block like such:
    try
    {
    RunningJob mapJob = JobClient.runJob(sortJob);
    }catch(IOException e)
    {
        fs.delete(tempDir,true);
        throw e;
    }

Should I create a Jira ticket with patches for this?

Regards,
Diaa

Reply via email to