On Wed, Apr 7, 2010 at 7:55 PM, Ryan Rawson <ryano...@gmail.com> wrote:
> If you have block caching turned off this would be expected... What > does hbase-site.xml look like? We have block caching on. I don't think we have any read traffic. Is it also used for things like compactions? I will post a link to the full RS log and config tomorrow. There are multiple messages like the following : ---- 2010-04-06 22:09:01,720 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing of campaign,4229\x233\x2320100308,1268123385426 because global memstore limit of 1.2g exceeded; currently 757.6m and flushing till 748.4m 2010-04-06 22:09:01,996 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing of site,16611\x234\x23201003,1270121579470 because global memstore limit of 1.2g exceeded; currently 753.7m and flushing till 748.4m 2010-04-06 22:09:02,279 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing of site,15093\x232\x2320100313,1269107719612 because global memstore limit of 1.2g exceeded; currently 749.9m and flushing till 748.4m ------ Our RS conf seems to be pretty memory starved. We are increasing memory and will make initial tweaks based on Hbase wikis. Raghu. On Wed, Apr 7, 2010 at 7:49 PM, Raghu Angadi <rang...@apache.org> wrote: > > We are working with a small HBase cluster (5 nodes) with fairly beefy > nodes. > > While looking at why all the regionservers died at one time, noticed that > > these servers read some files 100s of times a second. This may not be > cause > > of the error... but do you think this is odd? > > > > HBase version : 0.20.1. The cluster was handling mainly write traffic. > > Note that in datanode log, there are a lot of reads these files. > > > > One of RS logs: > > --- > > 2010-04-06 21:51:33,923 INFO > > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > MSG_REGION_OPEN: > > campaign,4522\x234\x23201003,1268865840941 > > 2010-04-06 21:51:34,211 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > > region campaign,4522\x234\x23201003,1268865840941/407724784 available; > > sequence id is 1607026498 > > 2010-04-06 21:51:43,327 INFO org.apache.hadoop.hdfs.DFSClient: Could not > > obtain block blk_8972126557191254374_1090962 from any node: > > java.io.IOException: No live nodes contain current block > > 2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient: Could not > > obtain block blk_-5586169098563059270_1078171 from any node: > > java.io.IOException: No live nodes contain current block > > 2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient: Could not > > obtain block blk_-7610953303919156937_1089667 from any node: > > java.io.IOException: No live nodes contain current block > > [...] > > ---- > > > > portion of grep for one the blocks mentioned above in datanode log : > > ---- > > 39725703, offset: 0, srvID: DS-977430382-10.10.0.72-50010-1266601998020, > > blockid: blk_8972126557191254374_1090962, duration: 97000 > > 2010-04-06 21:51:43,307 INFO > > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > > 10.10.0.72:50010, dest: /10.10.0.72:43699, bytes: 6976, op: HDFS_READ, > > cliID: DFSClient_-1439725703, offset: 0, srvID: > > DS-977430382-10.10.0.72-50010-1266601998020, blockid: > > blk_8972126557191254374_1090962, duration: 76000 > > 2010-04-06 21:51:43,310 INFO > > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > > 10.10.0.72:50010, dest: /10.10.0.72:45123, bytes: 6976, op: HDFS_READ, > > cliID: DFSClient_-1439725703, offset: 0, srvID: > > DS-977430382-10.10.0.72-50010-1266601998020, blockid: > > blk_8972126557191254374_1090962, duration: 93000 > > 2010-04-06 21:51:43,314 INFO > > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > > 10.10.0.72:50010, dest: /10.10.0.72:41891, bytes: 6976, op: HDFS_READ, > > cliID: DFSClient_-1439725703, offset: 0, srvID: > > DS-977430382-10.10.0.72-50010-1266601998020, blockid: > > blk_8972126557191254374_1090962, duration: 267000 > > 2010-04-06 21:51:43,318 INFO > > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > > 10.10.0.72:50010, dest: /10.10.0.72:46412, bytes: 6976, op: HDFS_READ, > > cliID: DFSClient_-1439725703, offset: 0, srvID: > > DS-977430382-10.10.0.72-50010-1266601998020, blockid: > > blk_8972126557191254374_1090962, duration: 91000 > > 2010-04-06 21:51:46,330 INFO > > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > > 10.10.0.72:50010, dest: /10.10.0.72:40657, bytes: 6976, op: HDFS_READ, > > cliID: DFSClient_-1439725703, offset: 0, srvID: > > DS-977430382-10.10.0.72-50010-1266601998020, blockid: > > blk_8972126557191254374_1090962, duration: 85000 > > ------ > > > > There are thousands of repeated reads of many small files like this. > > > > --- From NN log, this block was created > > for /hbase/.META./1028785192/info/1728561479703335912 > > 2010-04-06 21:51:20,906 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > > NameSystem.allocateBlock: > /hbase/.META./1028785192/info/1728561479703335912. > > blk_8972126557191254374_1090962 > > ---- > > > > Btw, we had single replication set for this file by mistake. > > > > thanks for taking a look. > > Raghu. > > >