I am attaching jstack I collected for the region servers which might have problem (grid07, grid08 and grid11)
grid07 was doing minor compaction: "regionserver/10.202.50.107:60020.compactor" daemon prio=10 tid=0x000000004d150800 nid=0x857 runnable [0x0000000043e6a000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.io.compress.zlib.ZlibCompressor.deflateBytesDirect(Native Method) at org.apache.hadoop.io.compress.zlib.ZlibCompressor.compress(ZlibCompressor.java:315) - locked <0x00002aaac06338d8> (a org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor) at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:76) at org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:71) at org.apache.hadoop.hbase.io.hfile.Compression$FinishOnFlushCompressionStream.write(Compression.java:62) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) - locked <0x00002aaabb870410> (a java.io.BufferedOutputStream) at java.io.DataOutputStream.write(DataOutputStream.java:90) - locked <0x00002aaabb871468> (a java.io.DataOutputStream) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:522) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:481) at org.apache.hadoop.hbase.regionserver.MinorCompactingStoreScanner.next(MinorCompactingStoreScanner.java:96) at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:922) at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:765) - locked <0x00002aaac2f09d28> (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:833) at org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:786) at org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:93) On Wed, Nov 24, 2010 at 4:35 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Hi, > We use 0.20.6 to process large amount of data: > FILE_BYTES_WRITTEN 132,953,083,977 > Map output bytes 300,214,289,928 > > In two of our mappers which timed out I saw: > > 2010-11-24 23:16:51,561 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, host=us01-ciqps1-name01.carrieriq.com:2181 > sessionTimeout=60000 > watcher=org.apache.hadoop.hbase.client.hconnectionmanager$clientzkwatc...@f855562 > > 2010-11-24 23:16:51,563 INFO org.apache.zookeeper.ClientCnxn: > zookeeper.disableAutoWatchReset is false > 2010-11-24 23:16:51,585 INFO org.apache.zookeeper.ClientCnxn: Attempting > connection to server us01-ciqps1-name01.carrieriq.com/10.202.50.100:2181 > > 2010-11-24 23:16:51,593 INFO org.apache.zookeeper.ClientCnxn: Priming > connection to java.nio.channels.SocketChannel[connected > local=/10.202.50.101:63047 > remote=us01-ciqps1-name01.carrieriq.com/10.202.50.100:2181] > > 2010-11-24 23:16:51,596 INFO org.apache.zookeeper.ClientCnxn: Server > connection successful > 2010-11-24 23:16:55,127 INFO > com.carrieriq.m2m.platform.mmp2.input.StripedHBaseTableInputFormat: Starting > scan of table 'GRID-GRIDSQL-STAGING-THREEGPPSPEECHCALLS-1290634808555' > > As of this moment, GRID-GRIDSQL-STAGING-THREEGPPSPEECHCALLS-1290634808555 > has been deleted because of failure handling in our flow. > > Our monitoring script started noticing the following at 2010-11-24 23-39-50 > (GMT): > > HBase Shell; enter 'help<RETURN>' for list of supported commands. > Version: 0.20.6, r965666, Mon Jul 19 15:48:07 PDT 2010 > get > 'GRID-GRIDSQL-STAGING-THREEGPPSPEECHCALLS-1290634808555','7B7C0D0BC834B8BD53422AFA94023223' > NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: > Trying to contact region server us01-ciqps1-grid12.carrieriq.com:60020 for > region > GRID-GRIDSQL-STAGING-THREEGPPSPEECHCALLS-1290634808555,7B7C0D0BC834B8BD53422AFA94023223,1290638846310, > row '7B7C0D0BC834B8BD53422AFA94023223', but failed after 7 attempts. > Exceptions: > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up > proxy to us01-ciqps1-grid12.carrieriq.com/10.202.50.112:60020 after > attempts=1 > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up > proxy to us01-ciqps1-grid12.carrieriq.com/10.202.50.112:60020 after > attempts=1 > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up > proxy to us01-ciqps1-grid12.carrieriq.com/10.202.50.112:60020 after > attempts=1 > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up > proxy to us01-ciqps1-grid12.carrieriq.com/10.202.50.112:60020 after > attempts=1 > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up > proxy to us01-ciqps1-grid12.carrieriq.com/10.202.50.112:60020 after > attempts=1 > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up > proxy to us01-ciqps1-grid12.carrieriq.com/10.202.50.112:60020 after > attempts=1 > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up > proxy to us01-ciqps1-grid12.carrieriq.com/10.202.50.112:60020 after > attempts=1 > > I have collected region server log (where I found occurrences of > GRID-GRIDSQL-STAGING-THREEGPPSPEECHCALLS-1290634808555) and master log > I can send the zipped tar ball to you upon request. > > Have a nice holiday. >