[ https://issues.apache.org/jira/browse/HADOOP-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528859 ]
Raghu Angadi commented on HADOOP-1874: -------------------------------------- Christian, looks like you might be using only one of the disks out of four for logging. If you still want to keep normal namenode log, you could point logs directories to one of the unused disks so that it does not conflict with editsLog. > lost task trackers -- jobs hang > ------------------------------- > > Key: HADOOP-1874 > URL: https://issues.apache.org/jira/browse/HADOOP-1874 > Project: Hadoop > Issue Type: Bug > Components: fs > Affects Versions: 0.15.0 > Reporter: Christian Kunz > Assignee: Devaraj Das > Priority: Blocker > Attachments: lazy-dfs-ops.1.patch, lazy-dfs-ops.2.patch, > lazy-dfs-ops.4.patch, lazy-dfs-ops.patch, server-throttle-hack.patch > > > This happens on a 1400 node cluster using a recent nightly build patched with > HADOOP-1763 (that fixes a previous 'lost task tracker' issue) running a > c++-pipes job with 4200 maps and 2800 reduces. The task trackers start to get > lost in high numbers at the end of job completion. > Similar non-pipes job do not show the same problem, but is unclear whether it > is related to c++-pipes. It could also be dfs overload when reduce tasks > close and validate all newly created dfs files. I see dfs client rpc timeout > exception. But this alone does not explain the escalation in losing task > trackers. > I also noticed that the job tracker becomes rather unresponsive with rpc > timeout and call queue overflow exceptions. Job Tracker is running with 60 > handlers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.