On Wed, Apr 7, 2010 at 7:55 PM, Ryan Rawson <ryano...@gmail.com> wrote:

> If you have block caching turned off this would be expected... What
> does hbase-site.xml look like?


We have block caching on. I don't think we have any read traffic. Is it also
used for things like compactions?

I will post a link to the full RS log and config tomorrow.

There are multiple messages like the following :
----
2010-04-06 22:09:01,720 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing of
campaign,4229\x233\x2320100308,1268123385426 because global memstore limit
of 1.2g exceeded; currently 757.6m and flushing till 748.4m
2010-04-06 22:09:01,996 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing of
site,16611\x234\x23201003,1270121579470 because global memstore limit of
1.2g exceeded; currently 753.7m and flushing till 748.4m
2010-04-06 22:09:02,279 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing of
site,15093\x232\x2320100313,1269107719612 because global memstore limit of
1.2g exceeded; currently 749.9m and flushing till 748.4m
------

Our RS conf seems to be pretty memory starved. We are increasing memory and
will make initial tweaks based on Hbase wikis.

Raghu.

On Wed, Apr 7, 2010 at 7:49 PM, Raghu Angadi <rang...@apache.org> wrote:
> > We are working with a small HBase cluster (5 nodes) with fairly beefy
> nodes.
> > While looking at why all the regionservers died at one time, noticed that
> > these servers read some files 100s of times a second. This may not be
> cause
> > of the error... but do you think this is odd?
> >
> > HBase version : 0.20.1. The cluster was handling mainly write traffic.
> > Note that in datanode log, there are a lot of reads these files.
> >
> > One of RS logs:
> >  ---
> > 2010-04-06 21:51:33,923 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_OPEN:
> > campaign,4522\x234\x23201003,1268865840941
> > 2010-04-06 21:51:34,211 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > region campaign,4522\x234\x23201003,1268865840941/407724784 available;
> > sequence id is 1607026498
> > 2010-04-06 21:51:43,327 INFO org.apache.hadoop.hdfs.DFSClient: Could not
> > obtain block blk_8972126557191254374_1090962 from any node:
> >  java.io.IOException: No live nodes contain current block
> > 2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient: Could not
> > obtain block blk_-5586169098563059270_1078171 from any node:
> >  java.io.IOException: No live nodes contain current block
> > 2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient: Could not
> > obtain block blk_-7610953303919156937_1089667 from any node:
> >  java.io.IOException: No live nodes contain current block
> > [...]
> > ----
> >
> > portion of grep for one the blocks mentioned above in datanode log :
> > ----
> > 39725703, offset: 0, srvID: DS-977430382-10.10.0.72-50010-1266601998020,
> > blockid: blk_8972126557191254374_1090962, duration: 97000
> > 2010-04-06 21:51:43,307 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> > 10.10.0.72:50010, dest: /10.10.0.72:43699, bytes: 6976, op: HDFS_READ,
> > cliID: DFSClient_-1439725703, offset: 0, srvID:
> > DS-977430382-10.10.0.72-50010-1266601998020, blockid:
> > blk_8972126557191254374_1090962, duration: 76000
> > 2010-04-06 21:51:43,310 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> > 10.10.0.72:50010, dest: /10.10.0.72:45123, bytes: 6976, op: HDFS_READ,
> > cliID: DFSClient_-1439725703, offset: 0, srvID:
> > DS-977430382-10.10.0.72-50010-1266601998020, blockid:
> > blk_8972126557191254374_1090962, duration: 93000
> > 2010-04-06 21:51:43,314 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> > 10.10.0.72:50010, dest: /10.10.0.72:41891, bytes: 6976, op: HDFS_READ,
> > cliID: DFSClient_-1439725703, offset: 0, srvID:
> > DS-977430382-10.10.0.72-50010-1266601998020, blockid:
> > blk_8972126557191254374_1090962, duration: 267000
> > 2010-04-06 21:51:43,318 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> > 10.10.0.72:50010, dest: /10.10.0.72:46412, bytes: 6976, op: HDFS_READ,
> > cliID: DFSClient_-1439725703, offset: 0, srvID:
> > DS-977430382-10.10.0.72-50010-1266601998020, blockid:
> > blk_8972126557191254374_1090962, duration: 91000
> > 2010-04-06 21:51:46,330 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> > 10.10.0.72:50010, dest: /10.10.0.72:40657, bytes: 6976, op: HDFS_READ,
> > cliID: DFSClient_-1439725703, offset: 0, srvID:
> > DS-977430382-10.10.0.72-50010-1266601998020, blockid:
> > blk_8972126557191254374_1090962, duration: 85000
> > ------
> >
> > There are thousands of repeated reads of many small files like this.
> >
> > --- From NN log, this block was created
> > for /hbase/.META./1028785192/info/1728561479703335912
> > 2010-04-06 21:51:20,906 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> > NameSystem.allocateBlock:
> /hbase/.META./1028785192/info/1728561479703335912.
> > blk_8972126557191254374_1090962
> > ----
> >
> > Btw, we had single replication set for this file by mistake.
> >
> > thanks for taking a look.
> > Raghu.
> >
>

Reply via email to