[
https://issues.apache.org/jira/browse/HADOOP-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720536#action_12720536
]
Hemanth Yamijala commented on HADOOP-6064:
------------------------------------------
Just for information, the failure this time around happened as follows:
- The test timed out in multipleQsWithOneQBeyondCapacity, while waiting for 5
map tasks to complete.
- The check for completion of tasks assumes all map tasks run successfully in
ControlledMapReduceJob. Note that the check is on jip.finishedMaps() which
does not count failed tasks.
- However, one of the map tasks failed this time, with the following stack
trace:
{noformat}
[junit] 09/06/17 12:49:20 INFO mapred.TaskInProgress: Error from
attempt_200906171248_0001_m_000003_0: java.io.FileNotFoundException: File
signalFileDir-7646601804912829477/MAPS_0 does not exist.
[junit] at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:383)
[junit] at
org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:301)
[junit] at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746)
[junit] at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:771)
[junit] at
org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:465)
[junit] at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746)
[junit] at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:806)
[junit] at
org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:936)
[junit] at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:891)
[junit] at
org.apache.hadoop.mapred.ControlledMapReduceJob.listSignalFiles(ControlledMapReduceJob.java:278)
[junit] at
org.apache.hadoop.mapred.ControlledMapReduceJob.map(ControlledMapReduceJob.java:318)
[junit] at
org.apache.hadoop.mapred.ControlledMapReduceJob.map(ControlledMapReduceJob.java:60)
[junit] at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
[junit] at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:363)
[junit] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:312)
[junit] at org.apache.hadoop.mapred.Child.main(Child.java:159)
{noformat}
- This, in turn, seems to relate to the problem described in HADOOP-4167. The
mappers all list contents of a filesystem looking for 'signal' files. These
signal files are renamed and therefore go missing asynchronously.
- The test waits forever and thus times out.
> Rewrite TestQueueCapacities to make it simpler and avoid timeout errors
> -----------------------------------------------------------------------
>
> Key: HADOOP-6064
> URL: https://issues.apache.org/jira/browse/HADOOP-6064
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/capacity-sched, test
> Affects Versions: 0.20.0
> Reporter: Hemanth Yamijala
>
> We have seen TestQueueCapacities fail periodically and there have been a
> couple of times fixes partially fixed the problem, the most recent instance
> being HADOOP-5869. I found another instance of failure, while running tests
> locally while testing a different patch. This was a different symptom from
> the ones we've seen before. The core problem is that the test is too complex
> and relies on too many things working correctly to be useful. It would make
> sense to revisit the purpose of the test and see if a simpler model can serve
> it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.