[
https://issues.apache.org/jira/browse/NUTCH-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425377#comment-16425377
]
Hudson commented on NUTCH-2518:
-------------------------------
SUCCESS: Integrated in Jenkins build Nutch-trunk #3515 (See
[https://builds.apache.org/job/Nutch-trunk/3515/])
NUTCH-2518 Cleaning up the file system after a job failure. (snagel:
[https://github.com/apache/nutch/commit/5907604b341f3eda1aae924606fce9022446132c])
* (edit) src/java/org/apache/nutch/crawl/CrawlDbMerger.java
* (edit) src/java/org/apache/nutch/crawl/LinkDbReader.java
* (edit) src/java/org/apache/nutch/indexer/IndexingJob.java
* (edit) src/java/org/apache/nutch/crawl/Injector.java
* (edit) src/java/org/apache/nutch/tools/warc/WARCExporter.java
* (edit) src/java/org/apache/nutch/util/SitemapProcessor.java
* (edit) src/java/org/apache/nutch/crawl/CrawlDbReader.java
* (edit) src/java/org/apache/nutch/crawl/CrawlDb.java
* (edit) src/java/org/apache/nutch/hostdb/ReadHostDb.java
* (edit) src/java/org/apache/nutch/segment/SegmentMerger.java
* (edit) src/java/org/apache/nutch/crawl/Generator.java
* (edit) src/java/org/apache/nutch/crawl/LinkDb.java
* (edit) src/java/org/apache/nutch/fetcher/Fetcher.java
* (edit) src/java/org/apache/nutch/indexer/CleaningJob.java
* (edit) src/java/org/apache/nutch/tools/FreeGenerator.java
* (edit) src/java/org/apache/nutch/crawl/LinkDbMerger.java
* (edit) src/java/org/apache/nutch/segment/SegmentReader.java
* (edit) src/java/org/apache/nutch/tools/arc/ArcSegmentCreator.java
* (edit) src/java/org/apache/nutch/util/NutchJob.java
* (edit) src/java/org/apache/nutch/hostdb/UpdateHostDb.java
* (edit) src/java/org/apache/nutch/parse/ParseSegment.java
* (edit) src/java/org/apache/nutch/crawl/DeduplicationJob.java
> Must check return value of job.waitForCompletion()
> --------------------------------------------------
>
> Key: NUTCH-2518
> URL: https://issues.apache.org/jira/browse/NUTCH-2518
> Project: Nutch
> Issue Type: Bug
> Components: crawldb, fetcher, generator, hostdb, linkdb
> Affects Versions: 1.15
> Reporter: Sebastian Nagel
> Priority: Blocker
> Fix For: 1.15
>
>
> The return value of job.waitForCompletion() of the new MapReduce API
> (NUTCH-2375) must always be checked. If it's not true, the job has been
> failed or killed. Accordingly, the program
> - should not proceed with further jobs/steps
> - must clean-up temporary data, unlock CrawlDB, etc.
> - exit with non-zero exit value, so that scripts running the crawl workflow
> can handle the failure
> Cf. NUTCH-2076, NUTCH-2442, [NUTCH-2375 PR
> #221|https://github.com/apache/nutch/pull/221#issuecomment-332941883].
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)