Devs,

We are using hbase version 0.90.6 (please don't complain of old
version. we are in process of upgrading) in our production and we are
noticing a strange problem arbitrarily for every few weeks. Region
server goes extremely slow.
We have to restart Region Server once this happens. There is no unique
pattern of this problem. This happens on different region servers,
different tables/regions and different times.

Here are observations & findings from our analysis.
- We are using LZO compression (0.4.10).

- [RS Dashboard] Flush is running for more than 6 hours. It is in
"creating writer" status for long time. Other previous flushes (600MB
to 1.5GB) takes

- [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
thread is in same state Configuration.loadResource
"regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800
nid=0x35e9 runnable [0x00007efcad9c5000]
   java.lang.Thread.State: RUNNABLE
    at 
sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
    at 
sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
    - locked <0x00007f02ccc2ef78> (a
sun.net.www.protocol.file.FileURLConnection)
    at 
com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
    ... [cutting down some stack to keep mail compact. all this stack
is in com.sun.org.apache.xerces...]
    at 
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
    at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
    at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
    at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
    - locked <0x00007f014f1543b8> (a org.apache.hadoop.conf.Configuration)
    at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
    at 
com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
    at com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
    at 
org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
    at 
org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
    at 
org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
    at 
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836)
    at 
org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530)
    - locked <0x00007efe1b6e7af8> (a java.lang.Object)
    at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496)
    at org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
    at 
org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
    at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242)

Any leads on this please?

-S

Reply via email to