On Tue, Jan 26, 2010 at 12:56 PM, Kannan Muthukkaruppan <kan...@facebook.com> wrote:
> 2) region server log does report ulimit of 64k. > Good. Then, we need to figure why its not effective at least for DNs -- something like a kernel issue as per the J-D pointer. > 3) An aside. Still using -XX:+CMSIncrementalMode. But I recall reading that > for machines with more than 2 cores, the recommendation is to not use the > incremental mode. Yes. Here is paragraph we quote: http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#icms. My guess is that its probably just ignored on a server-class machine like yours. You fellas probably have a bunch of GC experts in house. Grab one and show them your logs or post here and we can take a looksee. St.Ack > > -----Original Message----- > From: saint....@gmail.com [mailto:saint....@gmail.com] On Behalf Of Stack > Sent: Tuesday, January 26, 2010 8:32 AM > To: hbase-dev@hadoop.apache.org > Subject: Re: scanner lease expired/region server shutdown > > This is suspicious. Make sure that second line in master or > regionserver log where it prints out ulimit for the process has 64k > (and not 1024). You may not have set it for the DN user or setting it > on your linux may be ornery (See > http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6 for note about > red hat based systems have upper bound for system, or here, where an > interesting issue around kernel limits: > http://wiki.apache.org/hadoop/Hbase/FAQ#A6). > St.Ack > > On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan > <kan...@facebook.com> wrote: >> >> Looking further up in the logs (about 20 minutes prior in the logs when >> errors first started happening), I noticed the following. >> >> btw, ulimit -a shows that I have "open files" set to 64k. Is that not >> sufficient? >> >> 2010-01-25 11:10:21,774 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(10.129.68.212:50010, >> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, >> ipcPort=50020):Data\ >> XceiveServer: java.io.IOException: Too many open files >> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >> at >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145) >> at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130) >> at java.lang.Thread.run(Thread.java:619) >> >> 2010-01-25 11:10:21,566 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(10.129.68.212:50010, >> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, >> ipcPort=50020):Got \ >> exception while serving blk_3332344970774019423_10249 to /10.129.68.212: >> java.io.FileNotFoundException: >> /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta >> (Too many open files) >> at java.io.FileInputStream.open(Native Method) >> at java.io.FileInputStream.<init>(FileInputStream.java:106) >> at >> org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) >> at java.lang.Thread.run(Thread.java:619) >> >> >> >> ________________________________________ >> From: Kannan Muthukkaruppan [kan...@facebook.com] >> Sent: Tuesday, January 26, 2010 7:01 AM >> To: hbase-dev@hadoop.apache.org >> Subject: RE: scanner lease expired/region server shutdown >> >> 1. Yes, it is a 5 node, setup. >> >> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and >> the other three are running region servers. ZK is on all the same 5 nodes. >> Should ideally have separated this out. The nodes are 16GB, 4 disk machines. >> >> 2. I examined the HDFS datanode log on the same machine around that time the >> problems happened, and saw this: >> >> 2010-01-25 11:33:09,531 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(10.129.68.212:50010, >> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, >> ipcPort=\ >> 50020):Got exception while serving blk_5691809099673541164_10475 to >> /10.129.68.212: >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010 >> remote=/10.129.68.212:477\ >> 29] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >> at >> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) >> at >> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) >> at java.lang.Thread.run(Thread.java:619) >> >> [For future runs, will try to turn on Java GC logging as well as per your >> suggestion.] >> >> 3. You wrote <<< There is no line after the above? A region with a >> startkey of 0031841132? >>> >> >> I just cut-paste a section of the PE log and the scan of the '.META.' from >> shell. But there were many such regions for which the PE client reported >> errors of the form: >> >> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: >> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 >> because: Connection refused >> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed >> .META.,,1 for tableName=.META. from cache because of >> TestTable,0015411453,99999999999999 >> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached >> location for .META.,,1 is 10.129.68.212:60020 >> >> and scan of '.META.' revealed that those regions, such as the >> 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META. >> >> <<< Then its a dropped edit of .META. The parent region -- the one being >> split got updates which added the splitA and splitB and its offlining but >> the new region didn't get inserted? The crash happened just at that time?>>> >> >> Didn't fully understand the above. >> >> <<< It would be interesting to take a look at the regionserver logs from >> that time. Please post if you have a moment so we can take a looksee.>>> >> >> Will do. Should I just send it as an attachment to the list? Or is their a >> recommended way of doing this? >> >> regards, >> Kannan >> ________________________________________ >> From: saint....@gmail.com [saint....@gmail.com] On Behalf Of Stack >> [st...@duboce.net] >> Sent: Monday, January 25, 2010 8:59 PM >> To: hbase-dev@hadoop.apache.org >> Subject: Re: scanner lease expired/region server shutdown >> >> What J-D said but with ornamentation. See below. >> >> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan >> <kan...@facebook.com> wrote: >>> I was doing some brute force testing - running one instance of >>> PerformanceEvaluation (PE) for writes, and another instance for randomReads. >>> >> >> Whats the cluster your hitting like? That 5-node thingy? Whats the >> hardware profile. >> >> >>> One of the region servers went down after a while. [This is on 0.20.2]. The >>> region server log had things like: >>> >>> 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC >>> Server handler 34 on 60020 took 65190ms appending an edit to hlog; >>> editcount=27878 >> >> This is saying that it took 65 seconds to append to hdfs. What was >> going on at that time? A fat GC in the regionserver or over in a >> Datanode? You can enable GC logging uncommenting stuff in the >> hbase-env.sh. Feed the GC log to https://gchisto.dev.java.net/ >> (suggested by the zookeeper lads). Its good for finding the long >> pauses. We should find the logs around the long GC pause. Its >> probably a failed promotion that brought on the stop-the-world GC. Or >> your HDFS was struggling? >> >> I was going to say that the PE puts the worse kind of loading on the >> hbase cache -- nothing sticks around -- but looking at your numbers >> below, cache seems to be working pretty well. >> >>> 2010-01-25 11:33:39,416 DEBUG >>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: >>> Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\ >>> Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, >>> Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit >>> Ratio=85.40924191474\ >>> 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875 >>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC >>> Server handler 52 on 60020 took 65195ms appending an edit to hlog; >>> editcount=27879 >>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC >>> Server handler 75 on 60020 took 65195ms appending an edit to hlog; >>> editcount=27880 >>> 2010-01-25 11:33:39,417 INFO >>> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner >>> -5220035164735014996 lease expired >>> >>> This was followed by: >>> 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception >>> closing session 0x12652c620520001 to >>> sun.nio.ch.selectionkeyi...@788ab708<mailto:sun.nio.ch.selectionkeyi...@788ab708> >>> java.io.IOException: TIMED OUT >> >> This is the zk timeout. >> >>> >>> And also errors like: >>> 2010-01-25 11:33:39,692 ERROR >>> org.apache.hadoop.hbase.regionserver.HRegionServer: >>> org.apache.hadoop.hbase.UnknownScannerException: Name: -5220035164735014996 >>> >> >> These will happen after the above. The regionserver is on its way >> down. Probably emptied the list of outstanding regions. >> >> >>> Subsequently, I restarted the region server and just started the PE in >>> sequentialWrite mode. It seems to be off to a really slow start. For >>> several regions, it is printing the following: >>> >>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: >>> locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 >>> because: region offline: TestTable,0031841132,1264121653303 >>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed >>> .META.,,1 for tableName=.META. from cache because of >>> TestTable,0032000000,99999999999999 >>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached >>> location for .META.,,1 is 10.129.68.212:60020 >>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: >>> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 >>> because: region offline: TestTable,0031841132,1264121653303 >>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed >>> .META.,,1 for tableName=.META. from cache because of >>> TestTable,0032000000,99999999999999 >>> [..and so on..] >>> 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached >>> location for .META.,,1 is 10.129.68.212:60020 >>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: >>> locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000 >>> because: region offline: TestTable,0031841132,1264121653303 >>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed >>> .META.,,1 for tableName=.META. from cache because of >>> TestTable,0032000000,99999999999999 >>> >>> When I scan the '.META.', I noticed that most of the regions for which I >>> get the above error were in the middle of (?) a split. For example: >>> >>> TestTable,0031841132,126412 column=info:regioninfo, >>> timestamp=1264470099796, value=REGION => {NAME => 'TestTa >>> 1653303 ble,0031841132,1264121653303', STARTKEY => >>> '0031841132', ENDKEY => '0032057774', >>> ENCODED => 644500792, OFFLINE => true, SPLIT => >>> true, TABLE => {{NAME => 'TestTab >>> le', FAMILIES => [{NAME => 'info', VERSIONS => >>> '3', COMPRESSION => 'NONE', TTL => >>> '2147483647', BLOCKSIZE => '65536', IN_MEMORY >>> => 'false', BLOCKCACHE => 'true'}] >>> }} >>> TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, >>> value=10.129.68.214:60020 >>> 1653303 >>> TestTable,0031841132,126412 column=info:serverstartcode, >>> timestamp=1264121655148, value=1264109117245 >>> 1653303 >>> TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, >>> value=\000\n0031978284\000\000\000\0 >>> 1653303 >>> 01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\ >>> >>> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0 >>> >>> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0 >>> >>> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0 >>> >>> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\ >>> >>> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M >>> >>> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27 >>> 5\337\274# >>> TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, >>> value=\000\n0032057774\000\000\000\0 >>> 1653303 >>> 01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\ >>> >>> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0 >>> >>> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0 >>> >>> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0 >>> >>> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\ >>> >>> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M >>> >>> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2 >>> 75\312\262 >>> >>> Any thoughts/ideas on what might be going on? Appreciate any input in >>> helping understand what might be going on. >>> >> >> There is no line after the above? A region with a startkey of >> 0031841132? Then its a dropped edit of .META. The parent region -- >> the one being split got updates which added the splitA and splitB and >> its offlining but the new region didn't get inserted? The crash >> happened just at that time? >> >> It would be interesting to take a look at the regionserver logs from >> that time. Please post if you have a moment so we can take a looksee. >> >> Above kinda thing is what the master rewrite is about moving state >> transitions up to zk so atomic over cluster moving regions through >> transitions rather than as here, a multi-row update might not all go >> through as things currently work. >> >> St.Ack >> >>> Regards, >>> Kannan >>> >>> >>> >> >