On Fri, Sep 24, 2010 at 10:18 AM, Sharma, Avani <[email protected]> wrote:
> My HBase is 0.20.6. hadoop is 0.20.2.
> We have a 3 node cluster with master, namenode, jobtracker, tasktracker, 
> datanode and regionserver on one machine and the other two machines are 
> tasktracker, datanode and regionserver.
> The heap size for all 3 regionservers (only) is 4GB.
>

Can you add more nodes to your cluster?

4GB is likely too small to carry tasktracker, datanode, regionserver,
and then space left over for system.

The two factors above are probably cause of your non-smooth running.


> In hdfs-site.xml, dfs.datanode.max.xcievers = 2048

Do more.  4096 to be safe.

> dfs.datanode.socket.write.timeout = 0 (to avoid socket timeout errors. Is 
> this needed with this version of HBase?)
>

Probably not.


 ulimit -n = 2048
>

Do more.

>  I have gone through number of emails in the mailing list, but have not been 
> able to resolve this issue at my end. Any help is appreciated.
>
>
> 2010-09-24 06:19:35,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Deleting block blk_517680959608157971_70974 file 
> /ebay/hadoop/data/current/subd
> ir48/blk_517680959608157971
> 2010-09-24 06:19:35,239 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.110.210.11:50010, 
> storageID=DS-1397488018-10.110.210.11
> -50010-1281421533434, infoPort=50075, ipcPort=50020):Got exception while 
> serving blk_-6923756801300423893_70891 to /10.110.210.13:
> java.io.IOException: Block blk_-6923756801300423893_70891 is not valid.
>        at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734)
>        at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722)
>        at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
>        at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>        at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>

Grep these blocks in namenode logs to try and figure whats been going
on with them (look for deletes w/o closes of the hosting file, that
kinda thing).


> 2010-09-24 06:19:35,239 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.110.210.11:50010, storageID=DS-1397488018-10.110.210.1
> 1-50010-1281421533434, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.IOException: Block blk_-6923756801300423893_70891 is not valid.
>        at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734)
>        at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722)
>        at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
>        at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>        at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
>
> I restarted the regionserver. Before the restart, below is the error from 
> shutting down:
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
> /hbase/VRS/compaction.dir/945316184/5069182003336368746 File does not exist. 
> [Lease.
>  Holder: DFSClient_-511126949, pendingcreates: 1]


Trace history of this file in regionserver logs to see what happened
it.  It looks like you experienced double assignment... the same
regoin hosted by two servers  Check the master log for history of the
hosting region.  See if you can confirm.

St.Ack

>        at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1332)
>        at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1323)
>        at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1251)
>        at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>
> After the region server restarted, I get the following:
> 2010-09-24 04:31:39,149 WARN org.apache.hadoop.hbase.regionserver.Store: 
> Failed open of 
> hdfs://tnsardev01.vip.ebay.com/hbase/VRS/47647742/data/1134106899916871771.2021370808;
>  presumption is that fi
> le was corrupted at flush and lost edits picked up by commit log replay. 
> Verify!
> java.io.IOException: Cannot open filename 
> /hbase/VRS/2021370808/data/1134106899916871771
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497)
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1488)
>        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:376)
>        at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
>        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
>        at org.apache.hadoop.hbase.io.hfile.HFile$Reader.<init>(HFile.java:731)
>        at 
> org.apache.hadoop.hbase.io.HalfHFileReader.<init>(HalfHFileReader.java:66)
>        at 
> org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:266)
>
>
> and also
> 2010-09-24 05:42:15,333 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> org.apache.hadoop.hbase.NotServingRegionException:
>
>
> and a number of
> 2010-09-24 06:35:32,452 WARN org.apache.hadoop.hbase.regionserver.Store: Not 
> in setorg.apache.hadoop.hbase.regionserver.storescan...@48cf41af
> 2010-09-24 06:35:45,481 WARN org.apache.hadoop.hbase.regionserver.Store: Not 
> in setorg.apache.hadoop.hbase.regionserver.storescan...@cfc7ecf
> 2010-09-24 06:35:57,357 WARN org.apache.hadoop.hbase.regionserver.Store: Not 
> in setorg.apache.hadoop.hbase.regionserver.storescan...@68fe0234
> 2010-09-24 06:36:10,807 WARN org.apache.hadoop.hbase.regionserver.Store: Not 
> in setorg.apache.hadoop.hbase.regionserver.storescan...@2829fd48
> 2010-09-24 06:36:25,665 WARN org.apache.hadoop.hbase.regionserver.Store: Not 
> in setorg.apache.hadoop.hbase.regionserver.storescan...@64755a16
> 2010-09-24 06:36:35,236 WARN org.apache.hadoop.hbase.regionserver.Store: Not 
> in setorg.apache.hadoop.hbase.regionserver.storescan...@7463874c
> 2010-09-24 06:36:41,557 WARN org.apache.hadoop.hbase.regionserver.Store: Not 
> in setorg.apache.hadoop.hbase.regionserver.storescan...@10f706e7
>

Reply via email to