Re: global memcache limit of 396.9m exceeded cause forcing server shutdown

stack Tue, 03 Mar 2009 23:36:37 -0800

See http://wiki.apache.org/hadoop/Hbase/Troubleshooting#5.


On your hitting the global mem limit, you are uploading at the time, right?
If so, then its probably fine.  Whats your schema look like by the way?

The droppedsnapshot exception see in your first message should also be
addressed by #5 in troubleshooting above.

You have upped your ulimit file descriptors?

St.Ack




On Tue, Mar 3, 2009 at 11:29 PM, Basil He <[email protected]> wrote:

> Stack,
>
> After we switched to a larger EC2 instance, the problem is still there. and
> at same time we found following message from datanode's log.
>
> java.io.IOException: xceiverCount 1024 exceeds the limit of concurrent
> xcievers 1023
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:87)
> at java.lang.Thread.run(Thread.java:619)
> log from datanode.
>
> Thanks very much for your regard.
> Basil.
>
> On Sat, Feb 28, 2009 at 10:41 AM, Xiaogang He <[email protected]> wrote:
>
> > stack,
> >
> > Thanks for your reply, I really appreciate that.
> >
> > On Fri, Feb 27, 2009 at 11:49 PM, stack <[email protected]> wrote:
> >
> >> Tell us more about your hbase install?  Number of servers, number of
> >> regions, schema, general size of your cells and hbase version.
> >
> >
> > We just have a small hadoop cluster with 1 master and 3 slaves, and 1
> > single hmaser and 1 regionserver, and version numbers are both 0.19.
> >
> >
> >> The configuration that effects most directly the amount of heap used is
> >> the
> >> below:
> >>
> >>  <property>
> >>    <name>hbase.io.index.interval</name>
> >>    <value>128</value>
> >>    <description>The interval at which we record offsets in hbase
> >>    store files/mapfiles.  Default for stock mapfiles is 128.  Index
> >>    files are read into memory.  If there are many of them, could prove
> >>    a burden.  If so play with the hadoop io.map.index.skip property and
> >>    skip every nth index member when reading back the index into memory.
> >>    Downside to high index interval is lowered access times.
> >>    </description>
> >>
> >> You could try setting io.map.index.skip to 4 or 8 across your cluster
> and
> >> restart.
> >>
> >
> > We have namenode/secondnamenode/hmaster/regionserver running on a small
> EC2
> > instance(1.7G memory).
> > We think it should be part of the problem, so we switched to a larger
> > instance now.
> > We will try above suggestion if we hit the problem again.
> >
> >
> >>
> >> The flushing of the cache seems to be frustrated by an hdfs error in the
> >> below.  You have read the 'getting started' section and have upped your
> >> ulimit file descriptors?
> >
> >
> > Yes, we have changed ulimit file descriptors according to the FAQ on the
> > hbase official site.
> >
> > Thank you very much.
> >
> > Regards,
> > Basil.
> >
> >
> >>
> >>
> >> St.Ack
> >>
> >> On Thu, Feb 26, 2009 at 8:16 PM, Xiaogang He <[email protected]>
> wrote:
> >>
> >> > hi,
> >> >
> >> > I'm keeping hit following exception after hbase restarted and running
> a
> >> > while:
> >> >
> >> >        2009-02-26 15:14:04,827 INFO
> >> > org.apache.hadoop.hbase.regionserver.HLog: Closed
> >> >
> >> >
> >>
> hdfs://hmaster:50001/hbase/log_10.249.190.85_1235626687854_60020/hlog.dat.1235679079054,
> >> > entries=100053. New log writer:
> >> > /hbase/log_10.249.190.85_1235626687854_60020/hlog.dat.1235679244824
> >> >        2009-02-26 15:14:16,405 INFO
> >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing
> of
> >> > 1002_profiles,155123497688845858,1235539496917 because global memcache
> >> > limit
> >> > of 396.9m exceeded; currently 396.9m and flushing till 248.1m
> >> >        2009-02-26 15:14:18,666 INFO
> >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing
> of
> >> > 1002_profiles,145928983691898633,1235539496917 because global memcache
> >> > limit
> >> > of 396.9m exceeded; currently 386.3m and flushing till 248.1m
> >> >        2009-02-26 15:14:19,497 INFO
> >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing
> of
> >> > 1001_profiles,,1235562106563 because global memcache limit of 396.9m
> >> > exceeded; currently 376.2m and flushing till 248.1m
> >> >        2009-02-26 15:14:21,971 INFO
> >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing
> of
> >> > 1002_profiles,1859616112140717,1235538938447 because global memcache
> >> limit
> >> > of 396.9m exceeded; currently 367.1m and flushing till 248.1m
> >> >        2009-02-26 15:14:23,506 INFO
> >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing
> of
> >> > 1002_profiles,256848350134132138,1235539352160 because global memcache
> >> > limit
> >> > of 396.9m exceeded; currently 358.2m and flushing till 248.1m
> >> >        2009-02-26 15:14:26,273 INFO
> >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing
> of
> >> > 1001_profiles,38395253911274047,1235562695944 because global memcache
> >> limit
> >> > of 396.9m exceeded; currently 349.4m and flushing till 248.1m
> >> >        2009-02-26 15:14:27,946 INFO
> >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing
> of
> >> > 1001_relationships,18320094988761441,1235659399900 because global
> >> memcache
> >> > limit of 396.9m exceeded; currently 340.8m and flushing till 248.1m
> >> >        2009-02-26 15:14:28,898 INFO
> >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing
> of
> >> > 1001_profiles,183105869903093166,1235658588032 because global memcache
> >> > limit
> >> > of 396.9m exceeded; currently 332.3m and flushing till 248.1m
> >> >        2009-02-26 15:14:29,857 INFO
> >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing
> of
> >> > 1001_relationships,1279872936511407,1235563047231 because global
> >> memcache
> >> > limit of 396.9m exceeded; currently 323.9m and flushing till 248.1m
> >> >        2009-02-26 15:14:30,338 INFO
> >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing
> of
> >> > 1002_profiles,9374985809090827,1235658787938 because global memcache
> >> limit
> >> > of 396.9m exceeded; currently 315.5m and flushing till 248.1m
> >> >        2009-02-26 15:14:31,284 INFO org.apache.hadoop.hdfs.DFSClient:
> >> > Exception in createBlockOutputStream java.io.IOException: Bad connect
> >> ack
> >> > with firstBadLink 10.249.187.102:50010
> >> >        2009-02-26 15:14:31,284 INFO org.apache.hadoop.hdfs.DFSClient:
> >> > Abandoning block blk_-8226110948737137663_51382
> >> >        2009-02-26 15:14:39,640 INFO org.apache.hadoop.hdfs.DFSClient:
> >> > Exception in createBlockOutputStream java.io.IOException: Could not
> read
> >> > from stream
> >> >        2009-02-26 15:14:39,640 INFO org.apache.hadoop.hdfs.DFSClient:
> >> > Abandoning block blk_4802751471280593846_51382
> >> >        2009-02-26 15:14:45,807 INFO org.apache.hadoop.hdfs.DFSClient:
> >> > Exception in createBlockOutputStream java.io.IOException: Bad connect
> >> ack
> >> > with firstBadLink 10.249.187.102:50010
> >> >        2009-02-26 15:14:45,807 INFO org.apache.hadoop.hdfs.DFSClient:
> >> > Abandoning block blk_-3919223098697505175_51382
> >> >        2009-02-26 15:14:51,813 INFO org.apache.hadoop.hdfs.DFSClient:
> >> > Exception in createBlockOutputStream java.io.IOException: Could not
> read
> >> > from stream
> >> >        2009-02-26 15:14:51,813 INFO org.apache.hadoop.hdfs.DFSClient:
> >> > Abandoning block blk_-6922144209752436228_51382
> >> >        2009-02-26 15:14:57,827 WARN org.apache.hadoop.hdfs.DFSClient:
> >> > DataStreamer Exception: java.io.IOException: Unable to create new
> block.
> >> >            at
> >> >
> >> >
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2723)
> >> >            at
> >> >
> >> >
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
> >> >            at
> >> >
> >> >
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
> >> >
> >> >        2009-02-26 15:14:57,845 WARN org.apache.hadoop.hdfs.DFSClient:
> >> Error
> >> > Recovery for block blk_-6922144209752436228_51382 bad datanode[0]
> nodes
> >> ==
> >> > null
> >> >        2009-02-26 15:14:57,846 WARN org.apache.hadoop.hdfs.DFSClient:
> >> Could
> >> > not get block locations. Aborting...
> >> >        2009-02-26 15:14:57,924 FATAL
> >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog
> >> > required. Forcing server shutdown
> >> >        org.apache.hadoop.hbase.DroppedSnapshotException: region:
> >> > 1002_profiles,9374985809090827,1235658787938
> >> >            at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:896)
> >> >            at
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:789)
> >> >            at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227)
> >> >            at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushSomeRegions(MemcacheFlusher.java:291)
> >> >            at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:261)
> >> >            at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1614)
> >> >            at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown
> >> Source)
> >> >            at
> >> >
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> >            at java.lang.reflect.Method.invoke(Method.java:597)
> >> >            at
> >> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> >> >            at
> >> >
> >>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> >> >
> >> >
> >> > I noticed there are some parameters regarding this, such as
> >> > *hbase.regionserver.globalMemcache.upperLimit
> >> > and **hbase.regionserver.globalMemcache.lowerLimit*.
> >> >
> >> > I'm just using the default settings,
> >> >  <property>
> >> >    <name>hbase.regionserver.globalMemcache.upperLimit</name>
> >> >    <value>0.4</value>
> >> >    <description>Maximum size of all memcaches in a region server
> before
> >> new
> >> >      updates are blocked and flushes are forced. Defaults to 40% of
> >> heap.
> >> >    </description>
> >> >  </property>
> >> >  <property>
> >> >    <name>hbase.regionserver.globalMemcache.lowerLimit</name>
> >> >    <value>0.25</value>
> >> >    <description>When memcaches are being forced to flush to make room
> in
> >> >      memory, keep flushing until we hit this mark. Defaults to 30% of
> >> heap.
> >> >      This value equal to hbase.regionserver.globalmemcache.upperLimit
> >> > causes
> >> >      the minimum possible flushing to occur when updates are blocked
> due
> >> to
> >> >      memcache limiting.
> >> >    </description>
> >> >  </property>
> >> >
> >> > Could anyone please give me some guide to help me out of this issue?
> >> >
> >> > Thanks,
> >> > Basil.
> >> >
> >>
> >
> >
>

Re: global memcache limit of 396.9m exceeded cause forcing server shutdown

Reply via email to