Jean-Adrien: For kicks, I tried your test in hadoop 0.19.0 (and hbase trunk). Using default for *dfs.datanode.socket.write.timeout*, 8 minutes, Scenario B in your message to hadoop list, I started up a loaded cluster. After all had settled I counted DataXceiver threads in datanode. Each of my datanodes had more than 100 instances. I let it sit > 8 minutes. Now the datanodes had but one or two DataXceiver threads (I could see all the ERROR timeouts tripping in the datanode log). I started up a scan of all content in the table. It ran without issue. No exceptions in the regionserver logs. Number of DataXceiver threads came and went over life of the scan.
So, there is still the big datanode memory/thread pressure on startup and then there are the issues where there will be extra latency reestablishing timedout readers. Scenario C where you shutdown hbase, for me, also shuts down all resources in datanode. St.Ack On Sun, Jan 11, 2009 at 11:34 PM, stack <[email protected]> wrote: > Luo Ning, over the weekend, has made some comments you might be interested > in over in HBASE-24 Jean-Adrien. > St.Ack > > > > Jean-Adrien wrote: > >> Hi everybody, >> >> I saw that you put some advises concerning the Hadoop settings when one >> has >> a problem of max xceivers reached, in the troubleshooting section of the >> wiki. >> >> About this topic, I recently post a question in hadoop-core user mailing >> list about their 'xcievers' thread behavior, since I still had to increase >> their amount as my HBase table grows, in order to avoid to reach the limit >> at startup time. And therefore my jvm use a lot of virtual memory >> (actually >> with 500MB for the heap, 1100 threads allocate 2GB virtual memory). This >> evenutally yields to swap and failure. >> >> Here is the link to my post. With a graph showing the number of thread the >> datanode creates when I start hbase. >> http://www.nabble.com/xceiverCount-limit-reason-td21349807.html#a21352818 >> >> You can see that all threads are created at HBase startup time, and, if >> the >> timeout ( dfs.datanode.socket.write.timeout >> ) is set, they all ends with a timeout failure. >> >> The question for HBase is, why are the connection with hadoop kept open >> (and >> the thread as well) ? Does it happen only in my case ? >> I think that Slava has the same problem. But I don't think everybody does, >> since the cluster could not run without disabling the timeout parameter >> dfs.datanode.socket.write.timeout >> >> Anybody made those observations ? >> Thanks >> >> Jean-Adrien >> >> >> > >
