[jira] Commented: (HADOOP-210) Namenode not able to accept connections

Devaraj Das (JIRA) Tue, 06 Jun 2006 11:47:32 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-210?page=comments#action_12415008 ]


Devaraj Das commented on HADOOP-210:
------------------------------------

I am implementing this. For now I am using nio only for client accepts and 
subsequent reads from the client. The handler threads write the output/response 
directly by themselves to the clients concerned. Clients are disconnected if 
they don't communicate within a certain timeout. The thing is that time 
intervals could potentially be different for different protocols (e.g., dfs 
datanodes' heartbeats and client leases). So for now I am assuming a maximum 
timeout for the IPC communication (read from the conf file) and that is 
applicable for all RPC protocol communication. The servers keep track of when a 
client last communicated with it (either through TCP connect or through TCP 
write). Comments?

> Namenode not able to accept connections
> ---------------------------------------
>
>          Key: HADOOP-210
>          URL: http://issues.apache.org/jira/browse/HADOOP-210
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>  Environment: linux
>     Reporter: Mahadev konar
>     Assignee: Mahadev konar

>
> I am running owen's random writer on a 627 node cluster (writing 10GB/node).  
> After running for a while (map 12% reduce 1%) I get the following error on 
> the Namenode:
> Exception in thread "Server listener on port 60000" 
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:574)
>         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:105)
> After this, the namenode does not seem to be accepting connections from any 
> of the clients. All the DFSClient calls get timeout. Here is a trace for one 
> of them:
> java.net.SocketTimeoutException: timed out waiting for rpc response
>       at org.apache.hadoop.ipc.Client.call(Client.java:305)
>       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:149)
>       at org.apache.hadoop.dfs.$Proxy1.open(Unknown Source)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:419)
>       at org.apache.hadoop.dfs.DFSClient$DFSInputStream.(DFSClient.java:406)
>       at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:171)
>       at 
> org.apache.hadoop.dfs.DistributedFileSystem.openRaw(DistributedFileSystem.java:78)
>       at 
> org.apache.hadoop.fs.FSDataInputStream$Checker.(FSDataInputStream.java:46)
>       at org.apache.hadoop.fs.FSDataInputStream.(FSDataInputStream.java:228)
>       at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:157)
>       at 
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:43)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:105)
>       at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:785).
> The namenode then has around 1% CPU utilization at this time (after the 
> outofmemory exception has been thrown). I have profiled the NameNode and it 
> seems to be using around a maixmum heap size of 57MB (which is not much). So, 
> heap size does not seem to be a problem. It might be happening due to lack of 
> Stack space? Any pointers?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-210) Namenode not able to accept connections

Reply via email to