Hi,
I noticed that nutch doesn't handle cleaning up (removing temp folders) in
case of error.
In the following classes temp directories are created but not removed when
there is an error:
1. Injector
2. CrawlDBReader
3. Deduplication
4. SegmentReader
For example in injector you find:
RunningJob mapJob = JobClient.runJob(sortJob);
which is not encapsulated in a try catch block like such:
try
{
RunningJob mapJob = JobClient.runJob(sortJob);
}catch(IOException e)
{
fs.delete(tempDir,true);
throw e;
}
Should I create a Jira ticket with patches for this?
Regards,
Diaa