Maintaining cluster information across multiple job submissions
---------------------------------------------------------------

                 Key: HADOOP-2676
                 URL: https://issues.apache.org/jira/browse/HADOOP-2676
             Project: Hadoop
          Issue Type: Improvement
          Components: mapred
    Affects Versions: 0.15.2
            Reporter: lohit vijayarenu


Could we have a way to maintain cluster state across multiple job submissions.
Consider a scenario where we run multiple jobs in iteration on a cluster back 
to back. The nature of the job is same, but input/output might differ. 

Now, if a node is blacklisted in one iteration of job run, it would be useful 
to maintain this information and blacklist this node for next iteration of job 
as well. 
Another situation which we saw is, if there are failures less than 
mapred.map.max.attempts in each iterations few nodes are never marked for 
blacklisting. But in we consider two or three iterations, these nodes fail all 
jobs and should be taken out of cluster. This hampers overall performance of 
the job.

Could have have config variables something which matches a job type (provided 
by user) and maintains the cluster status for that job type alone? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to