[jira] [Commented] (MAPREDUCE-6579) JobStatus#getFailureInfo should not output diagnostic information when the job is running

Jason Lowe (JIRA) Thu, 11 Feb 2016 12:31:14 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143456#comment-15143456
 ]


Jason Lowe commented on MAPREDUCE-6579:
---------------------------------------

It's a little unfortunate that YARN-3946 started putting non-fatal messages 
into what is typically an app-driven diagnostic repository.  Now all 
applications will start getting these (probably mostly annoying) messages for 
every job completion, assuming that most app frameworks dump the diagnostic 
strings when the application completes.  It seems these new messages only make 
sense to report when the job is active and are mostly noise afterwards.

Back to the MapReduce side of this, IMHO we need to return diagnostics for any 
case where we used to return diagnostics before.  Since this is specific to 
MapReduce, we can check the MR AM to see all the places where we could set a 
diagnostic.  Most places I found only set the diagnostic when the job fails, 
but I did find at least one place where the diagnostic could be set yet the job 
could succeed.  When a task fails a job diagnostic is added, see 
JobImpl.TaskCompletedTransition#taskFailed.  If the user configured the job to 
allow some tasks to fail yet the job can succeed then we could end up with a 
successful job with some task failure messages in the diagnostics.

However that's a relatively rare config for a typical MapReduce job, and I'm 
not sure how many downstream software stacks are going to start getting upset 
when they see getFailureInfo start returning data on a regular basis for 
successful jobs.  It's rather unfortunate that the method is called 
getFailureInfo and will now always contain messages unrelated to any failure.  
The downstream stacks should be checking the overall job status and not 
empty/non-empty on the getFailureInfo result to know whether the job really did 
fail or not, so on one hand I'm leaning towards reporting them on success as 
well.  But then part of me thinks it will simply be annoying to have every job 
dump a bunch of messages on waiting to schedule, waiting to register, etc. on 
every successful job, which leads me to wonder if we really want YARN-3946 to 
work the way it does.

> JobStatus#getFailureInfo should not output diagnostic information when the 
> job is running
> -----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6579
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6579
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: test
>            Reporter: Rohith Sharma K S
>            Assignee: Akira AJISAKA
>            Priority: Blocker
>         Attachments: MAPREDUCE-6579.01.patch, MAPREDUCE-6579.02.patch, 
> MAPREDUCE-6579.03.patch, MAPREDUCE-6579.04.patch, MAPREDUCE-6579.05.patch
>
>
> From 
> [https://builds.apache.org/job/PreCommit-YARN-Build/9976/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient-jdk1.8.0_66.txt]
>  TestNetworkedJob are failed intermittently.
> {code}
> Running org.apache.hadoop.mapred.TestNetworkedJob
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 81.131 sec 
> <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob
> testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob)  Time elapsed: 
> 30.55 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[[Tue Dec 15 14:02:45 +0000 2015] 
> Application is Activated, waiting for resources to be assigned for AM.  
> Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = 
> <memory:8192, vCores:16> ; Queue's Absolute capacity = 100.0 % ; Queue's 
> Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ; ]> 
> but was:<[]>
>       at org.junit.Assert.assertEquals(Assert.java:115)
>       at org.junit.Assert.assertEquals(Assert.java:144)
>       at 
> org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:174)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6579) JobStatus#getFailureInfo should not output diagnostic information when the job is running

Reply via email to