[ 
https://issues.apache.org/jira/browse/NUTCH-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478028#comment-17478028
 ] 

ASF GitHub Bot commented on NUTCH-2923:
---------------------------------------

sebastian-nagel commented on a change in pull request #721:
URL: https://github.com/apache/nutch/pull/721#discussion_r786958191



##########
File path: src/java/org/apache/nutch/util/SitemapProcessor.java
##########
@@ -402,9 +402,8 @@ public void sitemap(Path crawldb, Path hostdb, Path 
sitemapUrlDir, boolean stric
     try {
       boolean success = job.waitForCompletion(true);
       if (!success) {
-        String message = "SitemapProcessor_" + crawldb.toString()
-            + " job did not succeed, job status: " + job.getStatus().getState()
-            + ", reason: " + job.getStatus().getFailureInfo();
+        String message = NutchJob.getJobFailureLogMessage(
+            "SitemapProcessor_" + crawldb.toString(), job);

Review comment:
       I know it was already there: maybe simplify the job name in the message 
to simply "SitemapProcessor" and leave the path to the CrawlDb away?

##########
File path: src/java/org/apache/nutch/util/NutchJob.java
##########
@@ -81,4 +83,26 @@ public static void cleanupAfterFailure(Path tempDir, Path 
lock, FileSystem fs)
     }
   }
 
+  /**
+   * Method to return job failure log message. To be used across all Jobs
+   * 
+   * @param name
+   *          Name/Type of the job
+   * @param job
+   *          Job Object for Job details
+   * @return job failure log message
+   * @throws IOException
+   *           Can occur during fetching job status
+   * @throws InterruptedException
+   *           Can occur during fetching job status
+   */
+  public static String getJobFailureLogMessage(String name, Job job)
+      throws IOException, InterruptedException {
+    if (job != null) {
+      return String.format(JOB_FAILURE_LOG_FORMAT, name, job.getJobID(),
+          job.getStatus(), job.getStatus().getFailureInfo());

Review comment:
       Really `job.getStatus()` instead of `job.getStatus.getState()` ?
   
   The log output doesn't look readable:
   ```
   2022-01-18 17:37:04,892 ERROR o.a.n.c.Injector [main] Injector job did not 
succeed, job id: job_local138302276_0001, job status: job-id : 
job_local138302276_0001uber-mode : falsemap-progress : 1.0reduce-progress : 
0.0cleanup-progress : 1.0setup-progress : 1.0runstate : FAILEDstart-time : 
0user-name : sebpriority : DEFAULTscheduling-info : 
NAnum-used-slots0num-reserved-slots0used-mem0reserved-mem0needed-mem0, reason: 
NA
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Add Job Id in Job Failure messages
> ----------------------------------
>
>                 Key: NUTCH-2923
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2923
>             Project: Nutch
>          Issue Type: Improvement
>          Components: crawldb, fetcher, generator, injector
>            Reporter: Prakhar Chaube
>            Priority: Minor
>
> In Job classes in case job doesn't succeed a failure message is initialized 
> as follows:
> _String message = "Generator job did not succeed, job status:"_
>               _+ job.getStatus().getState() + ", reason: "_
>               _+ job.getStatus().getFailureInfo();_
> However, this message doesn't contain JobId which can make it really hard to 
> reach the exact job from the logs. Adding JodId to this message will make it 
> easier to track and locate for debugging.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to