[
https://issues.apache.org/jira/browse/MAPREDUCE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154126#comment-13154126
]
Ravi Gummadi commented on MAPREDUCE-3440:
-----------------------------------------
MAPREDUCE-3121 has tests for the basic functionality of DFIP:
* Node's LocalDirsHandlerService identifies dirs' failures.
* Node is marked unhealthy when major percentage of dirs go bad.
* RM stops scheduling when major percentage of dirs go bad.
But some more tests can be added to test other components when disk failures
happen: Here is the list mentioned by Vinod on MAPREDUCE-3121:
* Integration test: Run a mapreduce job (so that Shuffle is also verified),
offline some disks, run one more job and verify that both the apps pass.
* LogAggregation test: Verify that logs written on bad disks are ignored for
aggregation (augment TestLogAggregationService) TODO:
* ContainerLaunch: Verify that
** new containers don't use bad directories(by testing the LOCAL_DIRS env in a
custom map job).
** if major percentage disks turn bad,
*** container should exit with proper exit code(should be easy with a
custom application).
*** localization for a resource fails.
> Add tests for testing other NM components with disk failures
> ------------------------------------------------------------
>
> Key: MAPREDUCE-3440
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3440
> Project: Hadoop Map/Reduce
> Issue Type: Test
> Affects Versions: 0.23.0
> Reporter: Ravi Gummadi
>
> Add more tests to test other components when disks fail.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira