[ 
https://issues.apache.org/jira/browse/HADOOP-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661503#action_12661503
 ] 

Vinod K V commented on HADOOP-4939:
-----------------------------------

Few comments:
 - The patch is breaking compilation because of the change in ClusterStatus 
constructor:
  -- src/mapred/org/apache/hadoop/mapred/LocalJobRunner.java +389
  -- src/test/org/apache/hadoop/mapred/TestJobQueueTaskScheduler.java:138
 - When sleepJob(or rather examples) are not on the path, it fails but with the 
output as follows:
        {code}
            JOB org.apache.hadoop.examples.SleepJob failed to run
            Waiting for the job org.apache.hadoop.examples.SleepJob to start
        {code}
    We should avoid the last line, *if* we can.
 - We can report progress of the jobs every once in a while when running the 
tests. Now it just stays dumb till the progress reaches the threshold values.
 - I think writing statements to a LOG is better than printing on standard 
output.
 - With a HOD allocation, the lost TaskTrackers simulating testcase fails even 
though keys are setup. This is because hadoop-daemons.sh tries on remote nodes 
to change to the non-existend directory HADOOP_HOME.
        {code}
            exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_HOME" 
\; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"
        {code}
    A simple solution would be to throw away all changes from hadoop-daemon.sh 
and hadoop-daemons.sh and simpley use slaves.sh as follows:
        {code}
            HOSTLIST=conf/_reliability_test_slaves_file_ ./bin/slaves.sh ls
        {code}
 - the -ww flag to ps (ps auxw -ww) is not available on cygwin. It only 
modifies screen output and can be avoided. A side nit that I observed is that 
SIGCONT doesn't seem to work on cygwin. That would make the lost tasktracker 
simulation test completely useless on cygwin.
 - The randomness of failures is pretty peculiar in the tests. Though it can be 
admitted that it can be changed later if need be.

> Create a test that would inject random failures for tasks in large jobs and 
> would also inject TaskTracker failures
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4939
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, test
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.20.0
>
>         Attachments: 4939.1.patch, 4939.patch
>
>
> Create a test that would inject random failures for tasks in large jobs and 
> would also inject TaskTracker failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to