Re: Region server slowdown

Salabhanjika S Mon, 17 Mar 2014 21:43:32 -0700

Thanks Rodinov & Enis for responding. I agree with you that we need to upgrade.


As I mentioned in my first mail, we are in process of upgrade.
>> >>> We are using hbase version 0.90.6 (please don't complain of old
>> >>> version. we are in process of upgrading)

- Suboptimal (as per me) code snippets I posted in followup mail holds
good for trunk as well.

- I strongly feel this issue has something to do with HBase version. I
verified the code paths of the stack I posted.
I don't see any significant changes in current version in this code
(Flusher - getCompressor).


On Tue, Mar 18, 2014 at 2:30 AM, Enis Söztutar <[email protected]> wrote:
> Hi
>
> Agreed with Vladimir. I doubt anybody will spend the time to debug the
> issue. It would be easier if you can upgrade your HBase cluster. Also you
> will have to upgrade your Hadoop cluster as well. You should go with
> 0.96.x/0.98.x and either Hadoop-2.2 or Hadoop2.3. Check out the Hbase book
> for the upgrade process.
>
> Enis
>
>
> On Mon, Mar 17, 2014 at 11:19 AM, Vladimir Rodionov <[email protected]
>> wrote:
>
>> I think, 0.90.6 has reached EOL a couple years ago. The best you can do
>> right now is
>> start planning upgrading to the latest stable 0.94 or 0.96.
>>
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: [email protected]
>>
>> ________________________________________
>> From: Salabhanjika S [[email protected]]
>> Sent: Monday, March 17, 2014 2:55 AM
>> To: [email protected]
>> Subject: Re: Region server slowdown
>>
>> @Devs, please respond if you can provide me some hints on this problem.
>>
>> Did some more analysis. While going through the code in stack track I
>> noticed something sub-optimal.
>> This may not be a root cause of our slowdown but I felt it may be some
>> thing worthy to optimize/fix.
>>
>> HBase is making a call to Compressor *WITHOUT* config object. This is
>> resulting in configuration reload for every call.
>> Should this be calling with existing config object as a parameter so
>> that configuration reload (discovery & xml parsing) will not happen so
>> frequently?
>>
>>
>> http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup
>> {code}
>> 309 public Compressor getCompressor() {
>> 310 CompressionCodec codec = getCodec(conf);
>> 311 if (codec != null) {
>> 312 Compressor compressor = CodecPool.getCompressor(codec);
>> 313 if (compressor != null) {
>> {code}
>>
>>
>> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup
>> {code}
>> 162 public static Compressor getCompressor(CompressionCodec codec) {
>> 163 return getCompressor(codec, null);
>> 164 }
>> {code}
>>
>> On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S <[email protected]>
>> wrote:
>> > Thanks for quick response Ted.
>> >
>> > - Hadoop version is 0.20.2
>> > - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds
>> >
>> > On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu <[email protected]> wrote:
>> >> What Hadoop version are you using ?
>> >>
>> >> Btw, the sentence about previous flushes was incomplete.
>> >>
>> >> Cheers
>> >>
>> >> On Mar 14, 2014, at 12:12 AM, Salabhanjika S <[email protected]>
>> wrote:
>> >>
>> >>> Devs,
>> >>>
>> >>> We are using hbase version 0.90.6 (please don't complain of old
>> >>> version. we are in process of upgrading) in our production and we are
>> >>> noticing a strange problem arbitrarily for every few weeks. Region
>> >>> server goes extremely slow.
>> >>> We have to restart Region Server once this happens. There is no unique
>> >>> pattern of this problem. This happens on different region servers,
>> >>> different tables/regions and different times.
>> >>>
>> >>> Here are observations & findings from our analysis.
>> >>> - We are using LZO compression (0.4.10).
>> >>>
>> >>> - [RS Dashboard] Flush is running for more than 6 hours. It is in
>> >>> "creating writer" status for long time. Other previous flushes (600MB
>> >>> to 1.5GB) takes
>> >>>
>> >>> - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
>> >>> thread is in same state Configuration.loadResource
>> >>> "regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800
>> >>> nid=0x35e9 runnable [0x00007efcad9c5000]
>> >>>   java.lang.Thread.State: RUNNABLE
>> >>>    at
>> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>> >>>    at
>> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
>> >>>    - locked <0x00007f02ccc2ef78> (a
>> >>> sun.net.www.protocol.file.FileURLConnection)
>> >>>    at
>> com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
>> >>>    ... [cutting down some stack to keep mail compact. all this stack
>> >>> is in com.sun.org.apache.xerces...]
>> >>>    at
>> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
>> >>>    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
>> >>>    at
>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
>> >>>    at
>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
>> >>>    at
>> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
>> >>>    - locked <0x00007f014f1543b8> (a
>> org.apache.hadoop.conf.Configuration)
>> >>>    at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
>> >>>    at
>> com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
>> >>>    at
>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
>> >>>    at
>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
>> >>>    at
>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
>> >>>    at
>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
>> >>>    at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
>> >>>    at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>> >>>    at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
>> >>>    at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>> >>>    at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530)
>> >>>    - locked <0x00007efe1b6e7af8> (a java.lang.Object)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242)
>> >>>
>> >>> Any leads on this please?
>> >>>
>> >>> -S
>>
>> Confidentiality Notice:  The information contained in this message,
>> including any attachments hereto, may be confidential and is intended to be
>> read only by the individual or entity to whom this message is addressed. If
>> the reader of this message is not the intended recipient or an agent or
>> designee of the intended recipient, please note that any review, use,
>> disclosure or distribution of this message or its attachments, in any form,
>> is strictly prohibited.  If you have received this message in error, please
>> immediately notify the sender and/or [email protected] and
>> delete or destroy any copy of this message and its attachments.
>>

Re: Region server slowdown

Reply via email to