Ok...
But I suggest you look at these links. They might help. http://wiki.apache.org/hadoop/PerformanceTuning http://java.sun.com/docs/hotspot/gc1.4.2/faq.html HTH -Mike > Subject: RE: Region server goes away > Date: Thu, 15 Apr 2010 10:34:34 -0700 > From: ghend...@decarta.com > To: hbase-user@hadoop.apache.org > > No, I didn't make any changes. Doubt it is garbage collection related. > This error happens immediately upon startup when nothing is accessing > Base, and the error continues periodically with seeming regularity. > Also, I am running on 64 bit machines, with GOB of heap per hardtop > process. > > -g > > -----Original Message----- > From: Michael Segel [mailto:michael_se...@hotmail.com] > Sent: Thursday, April 15, 2010 10:31 AM > To: hbase-user@hadoop.apache.org > Subject: RE: Region server goes away > > > > Did you make changes to your garbage collection? > > Could be that you've swamped your nodes and time out due to GC running. > > > > Subject: RE: Region server goes away > > Date: Thu, 15 Apr 2010 10:25:45 -0700 > > From: ghend...@decarta.com > > To: hbase-user@hadoop.apache.org > > > > After making all the recommended config changes, the only issue I see > it this, in the zookeeper logs. It happens repeatedly. Hbase shell seems > to work fine, running it on same machine as the zookeeper. Any ideas? I > reviewed a thread in the email list, on this topic, but it seemed > inconclusive.: > > > > 2010-04-15 04:14:36,048 WARN > > org.apache.zookeeper.server.PrepRequestProcessor: ot exception when > > processing sessionid:0x128012c809c0000 type:create cxid:0x4 z > > id:0xfffffffffffffffe txntype:unknown n/a > > org.apache.zookeeper.KeeperException$NodeExistsException: > KeeperErrorCode = Nod Existsof 0x128012c809c0002 valid:true > > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepReques > Processor.java:245)87c5a0000 > > at > > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProc > > ssor.java:114)27fe787c5a3bba > > > > -----Original Message----- > > From: saint....@gmail.com [mailto:saint....@gmail.com] On Behalf Of > > Stack > > Sent: Wednesday, April 14, 2010 8:45 PM > > To: hbase-user@hadoop.apache.org > > Cc: Paul Mahon; Bill Brune; Shaheen Bahauddin; Rohit Nigam > > Subject: Re: Region server goes away > > > > On Wed, Apr 14, 2010 at 8:27 PM, Geoff Hendrey <ghend...@decarta.com> > wrote: > > > Hi, > > > > > > I have posted previously about issues I was having with HDFS when I > > > was running HBase and HDFS on the same box both pseudoclustered. Now > > > > I have two very capable servers. I've setup HDFS with a datanode on > each box. > > > I've setup the namenode on one box, and the zookeeper and HDFS > > > master on the other box. Both boxes are region servers. I am using > > > hadoop > > > 20.2 and hbase 20.3. > > > > What do you have for replication? If two datanodes, you've set it to > two rather than default 3? > > > > > > > > > > I have set dfs.datanode.socket.write.timeout to 0 in hbase-site.xml. > > > > > This is probably not necessary. > > > > > > > I am running a mapreduce job with about 200 concurrent reducers, > > > each of which writes into HBase, with 32,000 row flush buffers. > > > > > > Why don't you try with just a few reducers first and then build it up? > > See if that works? > > > > > > About 40% > > > through the completion of my job, HDFS started showing one of the > > > datanodes was dead (the one *not* on the same machine as the > namenode). > > > > > > Do you think it dead -- what did a threaddump say? -- or was it just > that you couldn't get into it? Any errors in the datanode logs > complaining about xceiver count or perhaps you need to up the number of > handlers? > > > > > > > > > I stopped HBase, and magically the datanode came back to life. > > > > > > Any suggestions on how to increase the robustness? > > > > > > > > > I see errors like this in the datanode's log: > > > > > > 2010-04-14 12:54:58,692 ERROR > > > org.apache.hadoop.hdfs.server.datanode.DataNode: D > > > atanodeRegistration(10.241.6.80:50010, > > > storageID=DS-642079670-10.241.6.80-50010- > > > 1271178858027, infoPort=50075, ipcPort=50020):DataXceiver > > > java.net.SocketTimeoutException: 480000 millis timeout while waiting > > > > for channel > > > > > > I believe this harmless. Its just the DN timing out the socket -- you > set the timeout to 0 in the hbase-site.xml rather than in hdfs-site.xml > where it would have an effect. See HADOOP-3831 for detail. > > > > > > > to be ready for write. ch : > > > java.nio.channels.SocketChannel[connected > > > local=/10 > > > .241.6.80:50010 remote=/10.241.6.80:48320] > > > at > > > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTime > > > o > > > ut.java:246) > > > at > > > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutpu > > > t > > > Stream.java:159) > > > at > > > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutpu > > > t > > > Stream.java:198) > > > at > > > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockS > > > e > > > nder.java:313) > > > at > > > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSe > > > n > > > der.java:400) > > > at > > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXce > > > i > > > ver.java:180) > > > at > > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.j > > > a > > > : > > > > > > > > > Here I show the output of 'hadoop dfsadmin -report'. First time it > > > is invoked, all is well. Second time, one datanode is dead. Third > > > time, the dead datanode has come back to life.: > > > > > > [had...@dt1 ~]$ hadoop dfsadmin -report Configured Capacity: > > > 1277248323584 (1.16 TB) Present Capacity: 1208326105528 (1.1 TB) DFS > > > Remaining: 1056438108160 (983.88 GB) DFS Used: 151887997368 (141.46 > > > GB) DFS Used%: 12.57% Under replicated blocks: 3479 Blocks with > > > corrupt replicas: 0 Missing blocks: 0 > > > > > > ------------------------------------------------- > > > Datanodes available: 2 (2 total, 0 dead) > > > > > > Name: 10.241.6.79:50010 > > > Decommission Status : Normal > > > Configured Capacity: 643733970944 (599.52 GB) DFS Used: 75694104268 > > > (70.5 GB) Non DFS Used: 35150238004 (32.74 GB) DFS Remaining: > > > 532889628672(496.29 GB) DFS Used%: 11.76% DFS Remaining%: 82.78% > > > Last > > > contact: Wed Apr 14 11:20:59 PDT 2010 > > > > > > > > > > Yeah, my guess as per above is that the reporting client couldn't get > on to the datanode because handlers were full or xceivers exceeded. > > > > Let us know how it goes. > > St.Ack > > > > > > > Name: 10.241.6.80:50010 > > > Decommission Status : Normal > > > Configured Capacity: 633514352640 (590.01 GB) DFS Used: 76193893100 > > > (70.96 GB) Non DFS Used: 33771980052 (31.45 GB) DFS Remaining: > > > 523548479488(487.59 GB) DFS Used%: 12.03% DFS Remaining%: 82.64% > > > Last > > > contact: Wed Apr 14 11:14:37 PDT 2010 > > > > > > > > > [had...@dt1 ~]$ hadoop dfsadmin -report Configured Capacity: > > > 643733970944 (599.52 GB) Present Capacity: 609294929920 (567.45 GB) > > > DFS Remaining: 532876144640 (496.28 GB) DFS Used: 76418785280 (71.17 > > > GB) DFS Used%: 12.54% Under replicated blocks: 3247 Blocks with > > > corrupt replicas: 0 Missing blocks: 0 > > > > > > ------------------------------------------------- > > > Datanodes available: 1 (2 total, 1 dead) > > > > > > Name: 10.241.6.79:50010 > > > Decommission Status : Normal > > > Configured Capacity: 643733970944 (599.52 GB) DFS Used: 76418785280 > > > (71.17 GB) Non DFS Used: 34439041024 (32.07 GB) DFS Remaining: > > > 532876144640(496.28 GB) DFS Used%: 11.87% DFS Remaining%: 82.78% > > > Last > > > contact: Wed Apr 14 11:28:38 PDT 2010 > > > > > > > > > Name: 10.241.6.80:50010 > > > Decommission Status : Normal > > > Configured Capacity: 0 (0 KB) > > > DFS Used: 0 (0 KB) > > > Non DFS Used: 0 (0 KB) > > > DFS Remaining: 0(0 KB) > > > DFS Used%: 100% > > > DFS Remaining%: 0% > > > Last contact: Wed Apr 14 11:14:37 PDT 2010 > > > > > > > > > [had...@dt1 ~]$ hadoop dfsadmin -report Configured Capacity: > > > 1277248323584 (1.16 TB) Present Capacity: 1210726427080 (1.1 TB) DFS > > > Remaining: 1055440003072 (982.96 GB) DFS Used: 155286424008 (144.62 > > > GB) DFS Used%: 12.83% Under replicated blocks: 3338 Blocks with > > > corrupt replicas: 0 Missing blocks: 0 > > > > > > ------------------------------------------------- > > > Datanodes available: 2 (2 total, 0 dead) > > > > > > Name: 10.241.6.79:50010 > > > Decommission Status : Normal > > > Configured Capacity: 643733970944 (599.52 GB) DFS Used: 77775145981 > > > (72.43 GB) Non DFS Used: 33086850051 (30.81 GB) DFS Remaining: > > > 532871974912(496.28 GB) DFS Used%: 12.08% DFS Remaining%: 82.78% > > > Last > > > contact: Wed Apr 14 11:29:44 PDT 2010 > > > > > > > > > Name: 10.241.6.80:50010 > > > Decommission Status : Normal > > > Configured Capacity: 633514352640 (590.01 GB) DFS Used: 77511278027 > > > (72.19 GB) Non DFS Used: 33435046453 (31.14 GB) DFS Remaining: > > > 522568028160(486.68 GB) DFS Used%: 12.24% DFS Remaining%: 82.49% > > > Last > > > contact: Wed Apr 14 11:29:44 PDT 2010 > > > > > > > > > > > > > > _________________________________________________________________ > Hotmail is redefining busy with tools for the New Busy. Get more from > your inbox. > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL: > ON:WL:en-US:WM_HMP:042010_2 _________________________________________________________________ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4