[
https://issues.apache.org/jira/browse/MAPREDUCE-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143456#comment-15143456
]
Jason Lowe commented on MAPREDUCE-6579:
---------------------------------------
It's a little unfortunate that YARN-3946 started putting non-fatal messages
into what is typically an app-driven diagnostic repository. Now all
applications will start getting these (probably mostly annoying) messages for
every job completion, assuming that most app frameworks dump the diagnostic
strings when the application completes. It seems these new messages only make
sense to report when the job is active and are mostly noise afterwards.
Back to the MapReduce side of this, IMHO we need to return diagnostics for any
case where we used to return diagnostics before. Since this is specific to
MapReduce, we can check the MR AM to see all the places where we could set a
diagnostic. Most places I found only set the diagnostic when the job fails,
but I did find at least one place where the diagnostic could be set yet the job
could succeed. When a task fails a job diagnostic is added, see
JobImpl.TaskCompletedTransition#taskFailed. If the user configured the job to
allow some tasks to fail yet the job can succeed then we could end up with a
successful job with some task failure messages in the diagnostics.
However that's a relatively rare config for a typical MapReduce job, and I'm
not sure how many downstream software stacks are going to start getting upset
when they see getFailureInfo start returning data on a regular basis for
successful jobs. It's rather unfortunate that the method is called
getFailureInfo and will now always contain messages unrelated to any failure.
The downstream stacks should be checking the overall job status and not
empty/non-empty on the getFailureInfo result to know whether the job really did
fail or not, so on one hand I'm leaning towards reporting them on success as
well. But then part of me thinks it will simply be annoying to have every job
dump a bunch of messages on waiting to schedule, waiting to register, etc. on
every successful job, which leads me to wonder if we really want YARN-3946 to
work the way it does.
> JobStatus#getFailureInfo should not output diagnostic information when the
> job is running
> -----------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-6579
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6579
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: test
> Reporter: Rohith Sharma K S
> Assignee: Akira AJISAKA
> Priority: Blocker
> Attachments: MAPREDUCE-6579.01.patch, MAPREDUCE-6579.02.patch,
> MAPREDUCE-6579.03.patch, MAPREDUCE-6579.04.patch, MAPREDUCE-6579.05.patch
>
>
> From
> [https://builds.apache.org/job/PreCommit-YARN-Build/9976/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient-jdk1.8.0_66.txt]
> TestNetworkedJob are failed intermittently.
> {code}
> Running org.apache.hadoop.mapred.TestNetworkedJob
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 81.131 sec
> <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob
> testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob) Time elapsed:
> 30.55 sec <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[[Tue Dec 15 14:02:45 +0000 2015]
> Application is Activated, waiting for resources to be assigned for AM.
> Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource =
> <memory:8192, vCores:16> ; Queue's Absolute capacity = 100.0 % ; Queue's
> Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ; ]>
> but was:<[]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at
> org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:174)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)