On Tue, Jan 26, 2010 at 10:10 AM, Dhruba Borthakur <dhr...@gmail.com> wrote: > This exception is from the DataNode, right? This means that the datanode > process has 32K files open simultaneously, how can that be? For each block > read/write the datanode has two open files, one for the data and one for the > .meta where the crc gets stored. > Yeah, its on datanode. I was just being a little indirect saying check the regionserver log because it prints out ulimit that the process sees. If not 64k there, probably not 64k for DN.
> On the other hand, the datanode is configured via dfs.datanode.max.xcievers > to support 2048 read/write request simultanously, right? > So, you'd think he'd see xeiver complaint before "too many open files" if he was running into the 64k limit. St.Ack > thanks, > dhruba > > > On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan > <kan...@facebook.com>wrote: > >> >> Looking further up in the logs (about 20 minutes prior in the logs when >> errors first started happening), I noticed the following. >> >> btw, ulimit -a shows that I have "open files" set to 64k. Is that not >> sufficient? >> >> 2010-01-25 11:10:21,774 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( >> 10.129.68.212:50010, >> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, >> ipcPort=50020):Data\ >> XceiveServer: java.io.IOException: Too many open files >> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >> at >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145) >> at >> sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130) >> at java.lang.Thread.run(Thread.java:619) >> >> 2010-01-25 11:10:21,566 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( >> 10.129.68.212:50010, >> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, >> ipcPort=50020):Got \ >> exception while serving blk_3332344970774019423_10249 to /10.129.68.212: >> java.io.FileNotFoundException: >> /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta >> (Too many open files) >> at java.io.FileInputStream.open(Native Method) >> at java.io.FileInputStream.<init>(FileInputStream.java:106) >> at >> org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) >> at java.lang.Thread.run(Thread.java:619) >> >> >> >> ________________________________________ >> From: Kannan Muthukkaruppan [kan...@facebook.com] >> Sent: Tuesday, January 26, 2010 7:01 AM >> To: hbase-dev@hadoop.apache.org >> Subject: RE: scanner lease expired/region server shutdown >> >> 1. Yes, it is a 5 node, setup. >> >> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and >> the other three are running region servers. ZK is on all the same 5 nodes. >> Should ideally have separated this out. The nodes are 16GB, 4 disk machines. >> >> 2. I examined the HDFS datanode log on the same machine around that time >> the problems happened, and saw this: >> >> 2010-01-25 11:33:09,531 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( >> 10.129.68.212:50010, >> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, >> ipcPort=\ >> 50020):Got exception while serving blk_5691809099673541164_10475 to / >> 10.129.68.212: >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010remote=/ >> 10.129.68.212:477\ >> 29] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >> at >> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) >> at >> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) >> at java.lang.Thread.run(Thread.java:619) >> >> [For future runs, will try to turn on Java GC logging as well as per your >> suggestion.] >> >> 3. You wrote <<< There is no line after the above? A region with a >> startkey of 0031841132? >>> >> >> I just cut-paste a section of the PE log and the scan of the '.META.' from >> shell. But there were many such regions for which the PE client reported >> errors of the form: >> >> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: >> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 >> because: Connection refused >> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed >> .META.,,1 for tableName=.META. from cache because of >> TestTable,0015411453,99999999999999 >> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached >> location for .META.,,1 is 10.129.68.212:60020 >> >> and scan of '.META.' revealed that those regions, such as the >> 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META. >> >> <<< Then its a dropped edit of .META. The parent region -- the one being >> split got updates which added the splitA and splitB and its offlining but >> the new region didn't get inserted? The crash happened just at that >> time?>>> >> >> Didn't fully understand the above. >> >> <<< It would be interesting to take a look at the regionserver logs from >> that time. Please post if you have a moment so we can take a looksee.>>> >> >> Will do. Should I just send it as an attachment to the list? Or is their a >> recommended way of doing this? >> >> regards, >> Kannan >> ________________________________________ >> From: saint....@gmail.com [saint....@gmail.com] On Behalf Of Stack [ >> st...@duboce.net] >> Sent: Monday, January 25, 2010 8:59 PM >> To: hbase-dev@hadoop.apache.org >> Subject: Re: scanner lease expired/region server shutdown >> >> What J-D said but with ornamentation. See below. >> >> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan >> <kan...@facebook.com> wrote: >> > I was doing some brute force testing - running one instance of >> PerformanceEvaluation (PE) for writes, and another instance for randomReads. >> > >> >> Whats the cluster your hitting like? That 5-node thingy? Whats the >> hardware profile. >> >> >> > One of the region servers went down after a while. [This is on 0.20.2]. >> The region server log had things like: >> > >> > 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: >> IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog; >> editcount=27878 >> >> This is saying that it took 65 seconds to append to hdfs. What was >> going on at that time? A fat GC in the regionserver or over in a >> Datanode? You can enable GC logging uncommenting stuff in the >> hbase-env.sh. Feed the GC log to https://gchisto.dev.java.net/ >> (suggested by the zookeeper lads). Its good for finding the long >> pauses. We should find the logs around the long GC pause. Its >> probably a failed promotion that brought on the stop-the-world GC. Or >> your HDFS was struggling? >> >> I was going to say that the PE puts the worse kind of loading on the >> hbase cache -- nothing sticks around -- but looking at your numbers >> below, cache seems to be working pretty well. >> >> > 2010-01-25 11:33:39,416 DEBUG >> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: >> Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\ >> > Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, >> Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit >> Ratio=85.40924191474\ >> > 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875 >> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: >> IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog; >> editcount=27879 >> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: >> IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog; >> editcount=27880 >> > 2010-01-25 11:33:39,417 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner >> -5220035164735014996 lease expired >> > >> > This was followed by: >> > 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception >> closing session 0x12652c620520001 to sun.nio.ch.selectionkeyi...@788ab708 >> <mailto:sun.nio.ch.selectionkeyi...@788ab708> >> > java.io.IOException: TIMED OUT >> >> This is the zk timeout. >> >> > >> > And also errors like: >> > 2010-01-25 11:33:39,692 ERROR >> org.apache.hadoop.hbase.regionserver.HRegionServer: >> > org.apache.hadoop.hbase.UnknownScannerException: Name: >> -5220035164735014996 >> > >> >> These will happen after the above. The regionserver is on its way >> down. Probably emptied the list of outstanding regions. >> >> >> > Subsequently, I restarted the region server and just started the PE in >> sequentialWrite mode. It seems to be off to a really slow start. For several >> regions, it is printing the following: >> > >> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: >> locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 >> because: region offline: TestTable,0031841132,1264121653303 >> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed >> .META.,,1 for tableName=.META. from cache because of >> TestTable,0032000000,99999999999999 >> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached >> location for .META.,,1 is 10.129.68.212:60020 >> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: >> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 >> because: region offline: TestTable,0031841132,1264121653303 >> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed >> .META.,,1 for tableName=.META. from cache because of >> TestTable,0032000000,99999999999999 >> > [..and so on..] >> > 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached >> location for .META.,,1 is 10.129.68.212:60020 >> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: >> locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000 >> because: region offline: TestTable,0031841132,1264121653303 >> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed >> .META.,,1 for tableName=.META. from cache because of >> TestTable,0032000000,99999999999999 >> > >> > When I scan the '.META.', I noticed that most of the regions for which I >> get the above error were in the middle of (?) a split. For example: >> > >> > TestTable,0031841132,126412 column=info:regioninfo, >> timestamp=1264470099796, value=REGION => {NAME => 'TestTa >> > 1653303 ble,0031841132,1264121653303', STARTKEY => >> '0031841132', ENDKEY => '0032057774', >> > ENCODED => 644500792, OFFLINE => true, SPLIT >> => true, TABLE => {{NAME => 'TestTab >> > le', FAMILIES => [{NAME => 'info', VERSIONS >> => '3', COMPRESSION => 'NONE', TTL => >> > '2147483647', BLOCKSIZE => '65536', >> IN_MEMORY => 'false', BLOCKCACHE => 'true'}] >> > }} >> > TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, >> value=10.129.68.214:60020 >> > 1653303 >> > TestTable,0031841132,126412 column=info:serverstartcode, >> timestamp=1264121655148, value=1264109117245 >> > 1653303 >> > TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, >> value=\000\n0031978284\000\000\000\0 >> > 1653303 >> 01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\ >> > >> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0 >> > >> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0 >> > >> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0 >> > >> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\ >> > >> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M >> > >> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27 >> > 5\337\274# >> > TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, >> value=\000\n0032057774\000\000\000\0 >> > 1653303 >> 01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\ >> > >> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0 >> > >> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0 >> > >> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0 >> > >> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\ >> > >> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M >> > >> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2 >> > 75\312\262 >> > >> > Any thoughts/ideas on what might be going on? Appreciate any input in >> helping understand what might be going on. >> > >> >> There is no line after the above? A region with a startkey of >> 0031841132? Then its a dropped edit of .META. The parent region -- >> the one being split got updates which added the splitA and splitB and >> its offlining but the new region didn't get inserted? The crash >> happened just at that time? >> >> It would be interesting to take a look at the regionserver logs from >> that time. Please post if you have a moment so we can take a looksee. >> >> Above kinda thing is what the master rewrite is about moving state >> transitions up to zk so atomic over cluster moving regions through >> transitions rather than as here, a multi-row update might not all go >> through as things currently work. >> >> St.Ack >> >> > Regards, >> > Kannan >> > >> > >> > >> > > > > -- > Connect to me at http://www.facebook.com/dhruba >