@Devs, please respond if you can provide me some hints on this problem. Did some more analysis. While going through the code in stack track I noticed something sub-optimal. This may not be a root cause of our slowdown but I felt it may be some thing worthy to optimize/fix.
HBase is making a call to Compressor *WITHOUT* config object. This is resulting in configuration reload for every call. Should this be calling with existing config object as a parameter so that configuration reload (discovery & xml parsing) will not happen so frequently? http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup {code} 309 public Compressor getCompressor() { 310 CompressionCodec codec = getCodec(conf); 311 if (codec != null) { 312 Compressor compressor = CodecPool.getCompressor(codec); 313 if (compressor != null) { {code} http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup {code} 162 public static Compressor getCompressor(CompressionCodec codec) { 163 return getCompressor(codec, null); 164 } {code} On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S <[email protected]> wrote: > Thanks for quick response Ted. > > - Hadoop version is 0.20.2 > - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds > > On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu <[email protected]> wrote: >> What Hadoop version are you using ? >> >> Btw, the sentence about previous flushes was incomplete. >> >> Cheers >> >> On Mar 14, 2014, at 12:12 AM, Salabhanjika S <[email protected]> wrote: >> >>> Devs, >>> >>> We are using hbase version 0.90.6 (please don't complain of old >>> version. we are in process of upgrading) in our production and we are >>> noticing a strange problem arbitrarily for every few weeks. Region >>> server goes extremely slow. >>> We have to restart Region Server once this happens. There is no unique >>> pattern of this problem. This happens on different region servers, >>> different tables/regions and different times. >>> >>> Here are observations & findings from our analysis. >>> - We are using LZO compression (0.4.10). >>> >>> - [RS Dashboard] Flush is running for more than 6 hours. It is in >>> "creating writer" status for long time. Other previous flushes (600MB >>> to 1.5GB) takes >>> >>> - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor >>> thread is in same state Configuration.loadResource >>> "regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800 >>> nid=0x35e9 runnable [0x00007efcad9c5000] >>> java.lang.Thread.State: RUNNABLE >>> at >>> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70) >>> at >>> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161) >>> - locked <0x00007f02ccc2ef78> (a >>> sun.net.www.protocol.file.FileURLConnection) >>> at >>> com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653) >>> ... [cutting down some stack to keep mail compact. all this stack >>> is in com.sun.org.apache.xerces...] >>> at >>> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284) >>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180) >>> at >>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308) >>> at >>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259) >>> at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200) >>> - locked <0x00007f014f1543b8> (a org.apache.hadoop.conf.Configuration) >>> at org.apache.hadoop.conf.Configuration.get(Configuration.java:501) >>> at >>> com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205) >>> at >>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204) >>> at >>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105) >>> at >>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112) >>> at >>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236) >>> at >>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397) >>> at org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383) >>> at >>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354) >>> at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536) >>> at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501) >>> at >>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836) >>> at >>> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530) >>> - locked <0x00007efe1b6e7af8> (a java.lang.Object) >>> at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496) >>> at org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83) >>> at >>> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576) >>> at >>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046) >>> at >>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967) >>> at >>> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915) >>> at >>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394) >>> at >>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368) >>> at >>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242) >>> >>> Any leads on this please? >>> >>> -S
