[ https://issues.apache.org/jira/browse/MAPREDUCE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021854#comment-13021854 ]
Owen O'Malley commented on MAPREDUCE-2413: ------------------------------------------ The comment on get_value should be: {code} /* * function used to get a configuration value. * The function for the first time populates the configuration details into * array, next time onwards uses the populated array. * * Memory returned here should be freed using free. */ {code} free_values should be commented as: {code} // free an entry set of values void free_values(char** values) { if (*values != NULL) { // the values were tokenized from the same malloc, so freeing the first // frees the entire block. free(*values); } if (values != NULL) { free(values); } } {code} > TaskTracker should handle disk failures at both startup and runtime > ------------------------------------------------------------------- > > Key: MAPREDUCE-2413 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2413 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task-controller, tasktracker > Affects Versions: 0.20.204.0 > Reporter: Bharath Mundlapudi > Assignee: Ravi Gummadi > Fix For: 0.20.204.0 > > Attachments: MR-2413.v0.1.patch, MR-2413.v0.patch > > > At present, TaskTracker doesn't handle disk failures properly both at startup > and runtime. > (1) Currently TaskTracker doesn't come up if any of the mapred-local-dirs is > on a bad disk. TaskTracker should ignore that particular mapred-local-dir and > start up and use only the remaining good mapred-local-dirs. > (2) If a disk goes bad while TaskTracker is running, currently TaskTracker > doesn't do anything special. This results in either > (a) TaskTracker continues to "try to use that bad disk" and this results > in lots of task failures and possibly job failures(because of multiple TTs > having bad disks) and eventually these TTs getting graylisted for all jobs. > And this needs manual restart of TT with modified configuration of > mapred-local-dirs avoiding the bad disk. OR > (b) Health check script identifying the disk as bad and the TT gets > blacklisted. And this also needs manual restart of TT with modified > configuration of mapred-local-dirs avoiding the bad disk. > This JIRA is to make TaskTracker more fault-tolerant to disk failures solving > (1) and (2). i.e. TT should start even if at least one of the > mapred-local-dirs is on a good disk and TT should adjust its in-memory list > of mapred-local-dirs and avoid using bad mapred-local-dirs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira