[jira] Commented: (HADOOP-2676) Maintaining cluster information across multiple job submissions

Lohit Vijayarenu (JIRA) Thu, 21 Aug 2008 21:01:07 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624543#action_12624543
 ]


Lohit Vijayarenu commented on HADOOP-2676:
------------------------------------------

bq. +1 to Runping's comments. 
Should we also think about supporting this for DataNodes? We have been thinking 
about blacklisting datanodes, faulty ones. Namenode could consider a 
blacklisted datanode equivalent to 'decommissioned under progress' node. And 
also, un-blacklisting these nodes; does rebooting them makes them clean and 
remove from blacklisted nodes? 

> Maintaining cluster information across multiple job submissions
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2676
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2676
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.15.2
>            Reporter: Lohit Vijayarenu
>
> Could we have a way to maintain cluster state across multiple job submissions.
> Consider a scenario where we run multiple jobs in iteration on a cluster back 
> to back. The nature of the job is same, but input/output might differ. 
> Now, if a node is blacklisted in one iteration of job run, it would be useful 
> to maintain this information and blacklist this node for next iteration of 
> job as well. 
> Another situation which we saw is, if there are failures less than 
> mapred.map.max.attempts in each iterations few nodes are never marked for 
> blacklisting. But in we consider two or three iterations, these nodes fail 
> all jobs and should be taken out of cluster. This hampers overall performance 
> of the job.
> Could have have config variables something which matches a job type (provided 
> by user) and maintains the cluster status for that job type alone? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2676) Maintaining cluster information across multiple job submissions

Reply via email to