[jira] [Commented] (MAPREDUCE-2413) TaskTracker should handle disk failures at both startup and runtime

Owen O'Malley (JIRA) Tue, 19 Apr 2011 16:30:48 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021854#comment-13021854
 ]


Owen O'Malley commented on MAPREDUCE-2413:
------------------------------------------

The comment on get_value should be:

{code}
 /*
  * function used to get a configuration value.
  * The function for the first time populates the configuration details into
  * array, next time onwards uses the populated array.
  *
  * Memory returned here should be freed using free.
  */
{code}

free_values should be commented as:

{code}
// free an entry set of values
void free_values(char** values) {
  if (*values != NULL) {
    // the values were tokenized from the same malloc, so freeing the first
    // frees the entire block.
    free(*values);
  }
  if (values != NULL) {
    free(values);
  }
}
{code}

> TaskTracker should handle disk failures at both startup and runtime
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2413
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2413
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task-controller, tasktracker
>    Affects Versions: 0.20.204.0
>            Reporter: Bharath Mundlapudi
>            Assignee: Ravi Gummadi
>             Fix For: 0.20.204.0
>
>         Attachments: MR-2413.v0.1.patch, MR-2413.v0.patch
>
>
> At present, TaskTracker doesn't handle disk failures properly both at startup 
> and runtime.
> (1) Currently TaskTracker doesn't come up if any of the mapred-local-dirs is 
> on a bad disk. TaskTracker should ignore that particular mapred-local-dir and 
> start up and use only the remaining good mapred-local-dirs.
> (2) If a disk goes bad while TaskTracker is running, currently TaskTracker 
> doesn't do anything special. This results in either
>    (a) TaskTracker continues to "try to use that bad disk" and this results 
> in lots of task failures and possibly job failures(because of multiple TTs 
> having bad disks) and eventually these TTs getting graylisted for all jobs. 
> And this needs manual restart of TT with modified configuration of 
> mapred-local-dirs avoiding the bad disk. OR
>    (b) Health check script identifying the disk as bad and the TT gets 
> blacklisted. And this also needs manual restart of TT with modified 
> configuration of mapred-local-dirs avoiding the bad disk.
> This JIRA is to make TaskTracker more fault-tolerant to disk failures solving 
> (1) and (2). i.e. TT should start even if at least one of the 
> mapred-local-dirs is on a good disk and TT should adjust its in-memory list 
> of mapred-local-dirs and avoid using bad mapred-local-dirs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2413) TaskTracker should handle disk failures at both startup and runtime

Reply via email to