Re: global memcache limit of 396.9m exceeded cause forcing server shutdown

Basil He Tue, 03 Mar 2009 23:29:34 -0800

Stack,

After we switched to a larger EC2 instance, the problem is still there. and
at same time we found following message from datanode's log.


java.io.IOException: xceiverCount 1024 exceeds the limit of concurrent
xcievers 1023
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:87)
at java.lang.Thread.run(Thread.java:619)
log from datanode.

Thanks very much for your regard.
Basil.

On Sat, Feb 28, 2009 at 10:41 AM, Xiaogang He <[email protected]> wrote:

> stack,
>
> Thanks for your reply, I really appreciate that.
>
> On Fri, Feb 27, 2009 at 11:49 PM, stack <[email protected]> wrote:
>
>> Tell us more about your hbase install?  Number of servers, number of
>> regions, schema, general size of your cells and hbase version.
>
>
> We just have a small hadoop cluster with 1 master and 3 slaves, and 1
> single hmaser and 1 regionserver, and version numbers are both 0.19.
>
>
>> The configuration that effects most directly the amount of heap used is
>> the
>> below:
>>
>>  <property>
>>    <name>hbase.io.index.interval</name>
>>    <value>128</value>
>>    <description>The interval at which we record offsets in hbase
>>    store files/mapfiles.  Default for stock mapfiles is 128.  Index
>>    files are read into memory.  If there are many of them, could prove
>>    a burden.  If so play with the hadoop io.map.index.skip property and
>>    skip every nth index member when reading back the index into memory.
>>    Downside to high index interval is lowered access times.
>>    </description>
>>
>> You could try setting io.map.index.skip to 4 or 8 across your cluster and
>> restart.
>>
>
> We have namenode/secondnamenode/hmaster/regionserver running on a small EC2
> instance(1.7G memory).
> We think it should be part of the problem, so we switched to a larger
> instance now.
> We will try above suggestion if we hit the problem again.
>
>
>>
>> The flushing of the cache seems to be frustrated by an hdfs error in the
>> below.  You have read the 'getting started' section and have upped your
>> ulimit file descriptors?
>
>
> Yes, we have changed ulimit file descriptors according to the FAQ on the
> hbase official site.
>
> Thank you very much.
>
> Regards,
> Basil.
>
>
>>
>>
>> St.Ack
>>
>> On Thu, Feb 26, 2009 at 8:16 PM, Xiaogang He <[email protected]> wrote:
>>
>> > hi,
>> >
>> > I'm keeping hit following exception after hbase restarted and running a
>> > while:
>> >
>> >        2009-02-26 15:14:04,827 INFO
>> > org.apache.hadoop.hbase.regionserver.HLog: Closed
>> >
>> >
>> hdfs://hmaster:50001/hbase/log_10.249.190.85_1235626687854_60020/hlog.dat.1235679079054,
>> > entries=100053. New log writer:
>> > /hbase/log_10.249.190.85_1235626687854_60020/hlog.dat.1235679244824
>> >        2009-02-26 15:14:16,405 INFO
>> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing of
>> > 1002_profiles,155123497688845858,1235539496917 because global memcache
>> > limit
>> > of 396.9m exceeded; currently 396.9m and flushing till 248.1m
>> >        2009-02-26 15:14:18,666 INFO
>> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing of
>> > 1002_profiles,145928983691898633,1235539496917 because global memcache
>> > limit
>> > of 396.9m exceeded; currently 386.3m and flushing till 248.1m
>> >        2009-02-26 15:14:19,497 INFO
>> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing of
>> > 1001_profiles,,1235562106563 because global memcache limit of 396.9m
>> > exceeded; currently 376.2m and flushing till 248.1m
>> >        2009-02-26 15:14:21,971 INFO
>> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing of
>> > 1002_profiles,1859616112140717,1235538938447 because global memcache
>> limit
>> > of 396.9m exceeded; currently 367.1m and flushing till 248.1m
>> >        2009-02-26 15:14:23,506 INFO
>> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing of
>> > 1002_profiles,256848350134132138,1235539352160 because global memcache
>> > limit
>> > of 396.9m exceeded; currently 358.2m and flushing till 248.1m
>> >        2009-02-26 15:14:26,273 INFO
>> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing of
>> > 1001_profiles,38395253911274047,1235562695944 because global memcache
>> limit
>> > of 396.9m exceeded; currently 349.4m and flushing till 248.1m
>> >        2009-02-26 15:14:27,946 INFO
>> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing of
>> > 1001_relationships,18320094988761441,1235659399900 because global
>> memcache
>> > limit of 396.9m exceeded; currently 340.8m and flushing till 248.1m
>> >        2009-02-26 15:14:28,898 INFO
>> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing of
>> > 1001_profiles,183105869903093166,1235658588032 because global memcache
>> > limit
>> > of 396.9m exceeded; currently 332.3m and flushing till 248.1m
>> >        2009-02-26 15:14:29,857 INFO
>> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing of
>> > 1001_relationships,1279872936511407,1235563047231 because global
>> memcache
>> > limit of 396.9m exceeded; currently 323.9m and flushing till 248.1m
>> >        2009-02-26 15:14:30,338 INFO
>> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing of
>> > 1002_profiles,9374985809090827,1235658787938 because global memcache
>> limit
>> > of 396.9m exceeded; currently 315.5m and flushing till 248.1m
>> >        2009-02-26 15:14:31,284 INFO org.apache.hadoop.hdfs.DFSClient:
>> > Exception in createBlockOutputStream java.io.IOException: Bad connect
>> ack
>> > with firstBadLink 10.249.187.102:50010
>> >        2009-02-26 15:14:31,284 INFO org.apache.hadoop.hdfs.DFSClient:
>> > Abandoning block blk_-8226110948737137663_51382
>> >        2009-02-26 15:14:39,640 INFO org.apache.hadoop.hdfs.DFSClient:
>> > Exception in createBlockOutputStream java.io.IOException: Could not read
>> > from stream
>> >        2009-02-26 15:14:39,640 INFO org.apache.hadoop.hdfs.DFSClient:
>> > Abandoning block blk_4802751471280593846_51382
>> >        2009-02-26 15:14:45,807 INFO org.apache.hadoop.hdfs.DFSClient:
>> > Exception in createBlockOutputStream java.io.IOException: Bad connect
>> ack
>> > with firstBadLink 10.249.187.102:50010
>> >        2009-02-26 15:14:45,807 INFO org.apache.hadoop.hdfs.DFSClient:
>> > Abandoning block blk_-3919223098697505175_51382
>> >        2009-02-26 15:14:51,813 INFO org.apache.hadoop.hdfs.DFSClient:
>> > Exception in createBlockOutputStream java.io.IOException: Could not read
>> > from stream
>> >        2009-02-26 15:14:51,813 INFO org.apache.hadoop.hdfs.DFSClient:
>> > Abandoning block blk_-6922144209752436228_51382
>> >        2009-02-26 15:14:57,827 WARN org.apache.hadoop.hdfs.DFSClient:
>> > DataStreamer Exception: java.io.IOException: Unable to create new block.
>> >            at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2723)
>> >            at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
>> >            at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
>> >
>> >        2009-02-26 15:14:57,845 WARN org.apache.hadoop.hdfs.DFSClient:
>> Error
>> > Recovery for block blk_-6922144209752436228_51382 bad datanode[0] nodes
>> ==
>> > null
>> >        2009-02-26 15:14:57,846 WARN org.apache.hadoop.hdfs.DFSClient:
>> Could
>> > not get block locations. Aborting...
>> >        2009-02-26 15:14:57,924 FATAL
>> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog
>> > required. Forcing server shutdown
>> >        org.apache.hadoop.hbase.DroppedSnapshotException: region:
>> > 1002_profiles,9374985809090827,1235658787938
>> >            at
>> >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:896)
>> >            at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:789)
>> >            at
>> >
>> >
>> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227)
>> >            at
>> >
>> >
>> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushSomeRegions(MemcacheFlusher.java:291)
>> >            at
>> >
>> >
>> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:261)
>> >            at
>> >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1614)
>> >            at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown
>> Source)
>> >            at
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >            at java.lang.reflect.Method.invoke(Method.java:597)
>> >            at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>> >            at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>> >
>> >
>> > I noticed there are some parameters regarding this, such as
>> > *hbase.regionserver.globalMemcache.upperLimit
>> > and **hbase.regionserver.globalMemcache.lowerLimit*.
>> >
>> > I'm just using the default settings,
>> >  <property>
>> >    <name>hbase.regionserver.globalMemcache.upperLimit</name>
>> >    <value>0.4</value>
>> >    <description>Maximum size of all memcaches in a region server before
>> new
>> >      updates are blocked and flushes are forced. Defaults to 40% of
>> heap.
>> >    </description>
>> >  </property>
>> >  <property>
>> >    <name>hbase.regionserver.globalMemcache.lowerLimit</name>
>> >    <value>0.25</value>
>> >    <description>When memcaches are being forced to flush to make room in
>> >      memory, keep flushing until we hit this mark. Defaults to 30% of
>> heap.
>> >      This value equal to hbase.regionserver.globalmemcache.upperLimit
>> > causes
>> >      the minimum possible flushing to occur when updates are blocked due
>> to
>> >      memcache limiting.
>> >    </description>
>> >  </property>
>> >
>> > Could anyone please give me some guide to help me out of this issue?
>> >
>> > Thanks,
>> > Basil.
>> >
>>
>
>

Re: global memcache limit of 396.9m exceeded cause forcing server shutdown

Reply via email to