[ https://issues.apache.org/jira/browse/HADOOP-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661503#action_12661503 ]
Vinod K V commented on HADOOP-4939: ----------------------------------- Few comments: - The patch is breaking compilation because of the change in ClusterStatus constructor: -- src/mapred/org/apache/hadoop/mapred/LocalJobRunner.java +389 -- src/test/org/apache/hadoop/mapred/TestJobQueueTaskScheduler.java:138 - When sleepJob(or rather examples) are not on the path, it fails but with the output as follows: {code} JOB org.apache.hadoop.examples.SleepJob failed to run Waiting for the job org.apache.hadoop.examples.SleepJob to start {code} We should avoid the last line, *if* we can. - We can report progress of the jobs every once in a while when running the tests. Now it just stays dumb till the progress reaches the threshold values. - I think writing statements to a LOG is better than printing on standard output. - With a HOD allocation, the lost TaskTrackers simulating testcase fails even though keys are setup. This is because hadoop-daemons.sh tries on remote nodes to change to the non-existend directory HADOOP_HOME. {code} exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_HOME" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@" {code} A simple solution would be to throw away all changes from hadoop-daemon.sh and hadoop-daemons.sh and simpley use slaves.sh as follows: {code} HOSTLIST=conf/_reliability_test_slaves_file_ ./bin/slaves.sh ls {code} - the -ww flag to ps (ps auxw -ww) is not available on cygwin. It only modifies screen output and can be avoided. A side nit that I observed is that SIGCONT doesn't seem to work on cygwin. That would make the lost tasktracker simulation test completely useless on cygwin. - The randomness of failures is pretty peculiar in the tests. Though it can be admitted that it can be changed later if need be. > Create a test that would inject random failures for tasks in large jobs and > would also inject TaskTracker failures > ------------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-4939 > URL: https://issues.apache.org/jira/browse/HADOOP-4939 > Project: Hadoop Core > Issue Type: Sub-task > Components: mapred, test > Reporter: Devaraj Das > Assignee: Devaraj Das > Fix For: 0.20.0 > > Attachments: 4939.1.patch, 4939.patch > > > Create a test that would inject random failures for tasks in large jobs and > would also inject TaskTracker failures -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.