[jira] [Commented] (NUTCH-2518) Must check return value of job.waitForCompletion()

Hudson (JIRA) Wed, 04 Apr 2018 04:54:14 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425377#comment-16425377
 ]


Hudson commented on NUTCH-2518:
-------------------------------

SUCCESS: Integrated in Jenkins build Nutch-trunk #3515 (See 
[https://builds.apache.org/job/Nutch-trunk/3515/])
NUTCH-2518 Cleaning up the file system after a job failure. (snagel: 
[https://github.com/apache/nutch/commit/5907604b341f3eda1aae924606fce9022446132c])
* (edit) src/java/org/apache/nutch/crawl/CrawlDbMerger.java
* (edit) src/java/org/apache/nutch/crawl/LinkDbReader.java
* (edit) src/java/org/apache/nutch/indexer/IndexingJob.java
* (edit) src/java/org/apache/nutch/crawl/Injector.java
* (edit) src/java/org/apache/nutch/tools/warc/WARCExporter.java
* (edit) src/java/org/apache/nutch/util/SitemapProcessor.java
* (edit) src/java/org/apache/nutch/crawl/CrawlDbReader.java
* (edit) src/java/org/apache/nutch/crawl/CrawlDb.java
* (edit) src/java/org/apache/nutch/hostdb/ReadHostDb.java
* (edit) src/java/org/apache/nutch/segment/SegmentMerger.java
* (edit) src/java/org/apache/nutch/crawl/Generator.java
* (edit) src/java/org/apache/nutch/crawl/LinkDb.java
* (edit) src/java/org/apache/nutch/fetcher/Fetcher.java
* (edit) src/java/org/apache/nutch/indexer/CleaningJob.java
* (edit) src/java/org/apache/nutch/tools/FreeGenerator.java
* (edit) src/java/org/apache/nutch/crawl/LinkDbMerger.java
* (edit) src/java/org/apache/nutch/segment/SegmentReader.java
* (edit) src/java/org/apache/nutch/tools/arc/ArcSegmentCreator.java
* (edit) src/java/org/apache/nutch/util/NutchJob.java
* (edit) src/java/org/apache/nutch/hostdb/UpdateHostDb.java
* (edit) src/java/org/apache/nutch/parse/ParseSegment.java
* (edit) src/java/org/apache/nutch/crawl/DeduplicationJob.java


> Must check return value of job.waitForCompletion()
> --------------------------------------------------
>
>                 Key: NUTCH-2518
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2518
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb, fetcher, generator, hostdb, linkdb
>    Affects Versions: 1.15
>            Reporter: Sebastian Nagel
>            Priority: Blocker
>             Fix For: 1.15
>
>
> The return value of job.waitForCompletion() of the new MapReduce API 
> (NUTCH-2375) must always be checked. If it's not true, the job has been 
> failed or killed. Accordingly, the program
> - should not proceed with further jobs/steps
> - must clean-up temporary data, unlock CrawlDB, etc.
> - exit with non-zero exit value, so that scripts running the crawl workflow 
> can handle the failure
> Cf. NUTCH-2076, NUTCH-2442, [NUTCH-2375 PR 
> #221|https://github.com/apache/nutch/pull/221#issuecomment-332941883].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NUTCH-2518) Must check return value of job.waitForCompletion()

Reply via email to