Re: Region server slowdown

Salabhanjika S Mon, 17 Mar 2014 02:57:19 -0700

@Devs, please respond if you can provide me some hints on this problem.

Did some more analysis. While going through the code in stack track I
noticed something sub-optimal.
This may not be a root cause of our slowdown but I felt it may be some
thing worthy to optimize/fix.


HBase is making a call to Compressor *WITHOUT* config object. This is
resulting in configuration reload for every call.
Should this be calling with existing config object as a parameter so
that configuration reload (discovery & xml parsing) will not happen so
frequently?

http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup
{code}
309 public Compressor getCompressor() {
310 CompressionCodec codec = getCodec(conf);
311 if (codec != null) {
312 Compressor compressor = CodecPool.getCompressor(codec);
313 if (compressor != null) {
{code}

http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup
{code}
162 public static Compressor getCompressor(CompressionCodec codec) {
163 return getCompressor(codec, null);
164 }
{code}

On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S <[email protected]> wrote:
> Thanks for quick response Ted.
>
> - Hadoop version is 0.20.2
> - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds
>
> On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu <[email protected]> wrote:
>> What Hadoop version are you using ?
>>
>> Btw, the sentence about previous flushes was incomplete.
>>
>> Cheers
>>
>> On Mar 14, 2014, at 12:12 AM, Salabhanjika S <[email protected]> wrote:
>>
>>> Devs,
>>>
>>> We are using hbase version 0.90.6 (please don't complain of old
>>> version. we are in process of upgrading) in our production and we are
>>> noticing a strange problem arbitrarily for every few weeks. Region
>>> server goes extremely slow.
>>> We have to restart Region Server once this happens. There is no unique
>>> pattern of this problem. This happens on different region servers,
>>> different tables/regions and different times.
>>>
>>> Here are observations & findings from our analysis.
>>> - We are using LZO compression (0.4.10).
>>>
>>> - [RS Dashboard] Flush is running for more than 6 hours. It is in
>>> "creating writer" status for long time. Other previous flushes (600MB
>>> to 1.5GB) takes
>>>
>>> - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
>>> thread is in same state Configuration.loadResource
>>> "regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800
>>> nid=0x35e9 runnable [0x00007efcad9c5000]
>>>   java.lang.Thread.State: RUNNABLE
>>>    at 
>>> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>>>    at 
>>> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
>>>    - locked <0x00007f02ccc2ef78> (a
>>> sun.net.www.protocol.file.FileURLConnection)
>>>    at 
>>> com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
>>>    ... [cutting down some stack to keep mail compact. all this stack
>>> is in com.sun.org.apache.xerces...]
>>>    at 
>>> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
>>>    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
>>>    at 
>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
>>>    at 
>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
>>>    at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
>>>    - locked <0x00007f014f1543b8> (a org.apache.hadoop.conf.Configuration)
>>>    at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
>>>    at 
>>> com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
>>>    at 
>>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
>>>    at 
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
>>>    at 
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
>>>    at 
>>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
>>>    at 
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
>>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>>    at 
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
>>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>>    at 
>>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836)
>>>    at 
>>> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530)
>>>    - locked <0x00007efe1b6e7af8> (a java.lang.Object)
>>>    at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496)
>>>    at org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
>>>    at 
>>> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576)
>>>    at 
>>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046)
>>>    at 
>>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
>>>    at 
>>> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915)
>>>    at 
>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394)
>>>    at 
>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368)
>>>    at 
>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242)
>>>
>>> Any leads on this please?
>>>
>>> -S

Re: Region server slowdown

Reply via email to