[ http://issues.apache.org/jira/browse/HADOOP-210?page=comments#action_12415008 ]
Devaraj Das commented on HADOOP-210: ------------------------------------ I am implementing this. For now I am using nio only for client accepts and subsequent reads from the client. The handler threads write the output/response directly by themselves to the clients concerned. Clients are disconnected if they don't communicate within a certain timeout. The thing is that time intervals could potentially be different for different protocols (e.g., dfs datanodes' heartbeats and client leases). So for now I am assuming a maximum timeout for the IPC communication (read from the conf file) and that is applicable for all RPC protocol communication. The servers keep track of when a client last communicated with it (either through TCP connect or through TCP write). Comments? > Namenode not able to accept connections > --------------------------------------- > > Key: HADOOP-210 > URL: http://issues.apache.org/jira/browse/HADOOP-210 > Project: Hadoop > Type: Bug > Components: dfs > Environment: linux > Reporter: Mahadev konar > Assignee: Mahadev konar > > I am running owen's random writer on a 627 node cluster (writing 10GB/node). > After running for a while (map 12% reduce 1%) I get the following error on > the Namenode: > Exception in thread "Server listener on port 60000" > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:574) > at org.apache.hadoop.ipc.Server$Listener.run(Server.java:105) > After this, the namenode does not seem to be accepting connections from any > of the clients. All the DFSClient calls get timeout. Here is a trace for one > of them: > java.net.SocketTimeoutException: timed out waiting for rpc response > at org.apache.hadoop.ipc.Client.call(Client.java:305) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:149) > at org.apache.hadoop.dfs.$Proxy1.open(Unknown Source) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:419) > at org.apache.hadoop.dfs.DFSClient$DFSInputStream.(DFSClient.java:406) > at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:171) > at > org.apache.hadoop.dfs.DistributedFileSystem.openRaw(DistributedFileSystem.java:78) > at > org.apache.hadoop.fs.FSDataInputStream$Checker.(FSDataInputStream.java:46) > at org.apache.hadoop.fs.FSDataInputStream.(FSDataInputStream.java:228) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:157) > at > org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:43) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:105) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:785). > The namenode then has around 1% CPU utilization at this time (after the > outofmemory exception has been thrown). I have profiled the NameNode and it > seems to be using around a maixmum heap size of 57MB (which is not much). So, > heap size does not seem to be a problem. It might be happening due to lack of > Stack space? Any pointers? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
