Re: scanner lease expired/region server shutdown

Stack Tue, 26 Jan 2010 10:24:38 -0800

On Tue, Jan 26, 2010 at 10:10 AM, Dhruba Borthakur <[email protected]> wrote:
> This exception is from the DataNode, right? This means that the datanode
> process has 32K files open simultaneously, how can that be? For each block
> read/write the datanode has two open files, one for the data and one for the
> .meta where the crc gets stored.
>
Yeah, its on datanode.  I was just being a little indirect saying
check the regionserver log because it prints out ulimit that the
process sees.  If not 64k there, probably not 64k for DN.



> On the other hand, the datanode is configured via dfs.datanode.max.xcievers
> to support 2048 read/write request simultanously, right?
>

So, you'd think he'd see xeiver complaint before "too many open files"
if he was running into the 64k limit.

St.Ack



> thanks,
> dhruba
>
>
> On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan
> <[email protected]>wrote:
>
>>
>> Looking further up in the logs (about 20 minutes prior in the logs when
>> errors first started happening), I noticed the following.
>>
>> btw, ulimit -a shows that I have "open files" set to 64k. Is that not
>> sufficient?
>>
>> 2010-01-25 11:10:21,774 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 10.129.68.212:50010,
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
>> ipcPort=50020):Data\
>> XceiveServer: java.io.IOException: Too many open files
>>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>        at
>> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
>>        at
>> sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
>>         at java.lang.Thread.run(Thread.java:619)
>>
>> 2010-01-25 11:10:21,566 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 10.129.68.212:50010,
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
>> ipcPort=50020):Got \
>> exception while serving blk_3332344970774019423_10249 to /10.129.68.212:
>> java.io.FileNotFoundException:
>> /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta
>> (Too many open files)
>>        at java.io.FileInputStream.open(Native Method)
>>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>> ________________________________________
>> From: Kannan Muthukkaruppan [[email protected]]
>> Sent: Tuesday, January 26, 2010 7:01 AM
>> To: [email protected]
>> Subject: RE: scanner lease expired/region server shutdown
>>
>> 1. Yes, it is a 5 node, setup.
>>
>> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and
>> the other three are running region servers. ZK is on all the same 5 nodes.
>> Should ideally have separated this out. The nodes are 16GB, 4 disk machines.
>>
>> 2. I examined the HDFS datanode log on the same machine around that time
>> the problems happened, and saw this:
>>
>> 2010-01-25 11:33:09,531 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 10.129.68.212:50010,
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
>> ipcPort=\
>> 50020):Got exception while serving blk_5691809099673541164_10475 to /
>> 10.129.68.212:
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
>> channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010remote=/
>> 10.129.68.212:477\
>> 29]
>>        at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>>        at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>>        at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>> [For future runs, will try to turn on Java GC logging as well as per your
>> suggestion.]
>>
>> 3.  You wrote <<< There is no line after the above?  A region with a
>> startkey of 0031841132?  >>>
>>
>> I just cut-paste a section of the PE log and the scan of the '.META.' from
>> shell. But there were many such regions for which the PE client reported
>> errors of the form:
>>
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000
>> because: Connection refused
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0015411453,99999999999999
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached
>> location for .META.,,1 is 10.129.68.212:60020
>>
>> and scan of '.META.' revealed that those regions, such as the
>> 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META.
>>
>> <<< Then its a dropped edit of .META.  The parent region -- the one being
>> split got updates which added the splitA and splitB and its offlining but
>> the new region didn't get inserted?  The crash happened just at that
>> time?>>>
>>
>> Didn't fully understand the above.
>>
>> <<< It would be interesting to take a look at the regionserver logs from
>> that time.  Please post if you have a moment so we can take a looksee.>>>
>>
>> Will do. Should I just send it as an attachment to the list? Or is their a
>> recommended way of doing this?
>>
>> regards,
>> Kannan
>> ________________________________________
>> From: [email protected] [[email protected]] On Behalf Of Stack [
>> [email protected]]
>> Sent: Monday, January 25, 2010 8:59 PM
>> To: [email protected]
>> Subject: Re: scanner lease expired/region server shutdown
>>
>> What J-D said but with ornamentation.  See below.
>>
>> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
>> <[email protected]> wrote:
>> > I was doing some brute force testing - running one instance of
>> PerformanceEvaluation (PE) for writes, and another instance for randomReads.
>> >
>>
>> Whats the cluster your hitting like?  That 5-node thingy?  Whats the
>> hardware profile.
>>
>>
>> > One of the region servers went down after a while. [This is on 0.20.2].
>> The region server log had things like:
>> >
>> > 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog:
>> IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog;
>> editcount=27878
>>
>> This is saying that it took 65 seconds to append to hdfs.  What was
>> going on at that time?  A fat GC in the regionserver or over in a
>> Datanode?  You can enable GC logging uncommenting stuff in the
>> hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
>> (suggested by the zookeeper lads).  Its good for finding the long
>> pauses.  We should find the logs around the long GC pause.  Its
>> probably a failed promotion that brought on the stop-the-world GC.  Or
>> your HDFS was struggling?
>>
>> I was going to say that the PE puts the worse kind of loading on the
>> hbase cache -- nothing sticks around -- but looking at your numbers
>> below, cache seems to be working pretty well.
>>
>> > 2010-01-25 11:33:39,416 DEBUG
>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>> Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>> >  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689,
>> Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit
>> Ratio=85.40924191474\
>> > 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
>> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog:
>> IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog;
>> editcount=27879
>> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog:
>> IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog;
>> editcount=27880
>> > 2010-01-25 11:33:39,417 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
>> -5220035164735014996 lease expired
>> >
>> > This was followed by:
>> > 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception
>> closing session 0x12652c620520001 to sun.nio.ch.selectionkeyi...@788ab708
>> <mailto:sun.nio.ch.selectionkeyi...@788ab708>
>> > java.io.IOException: TIMED OUT
>>
>> This is the zk timeout.
>>
>> >
>> > And also errors like:
>> > 2010-01-25 11:33:39,692 ERROR
>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>> > org.apache.hadoop.hbase.UnknownScannerException: Name:
>> -5220035164735014996
>> >
>>
>> These will happen after the above.  The regionserver is on its way
>> down.  Probably emptied the list of outstanding regions.
>>
>>
>> > Subsequently, I restarted the region server and just started the PE in
>> sequentialWrite mode. It seems to be off to a really slow start. For several
>> regions, it is printing the following:
>> >
>> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000
>> because: region offline: TestTable,0031841132,1264121653303
>> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0032000000,99999999999999
>> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached
>> location for .META.,,1 is 10.129.68.212:60020
>> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000
>> because: region offline: TestTable,0031841132,1264121653303
>> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0032000000,99999999999999
>> > [..and so on..]
>> > 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached
>> location for .META.,,1 is 10.129.68.212:60020
>> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000
>> because: region offline: TestTable,0031841132,1264121653303
>> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0032000000,99999999999999
>> >
>> > When I scan the '.META.', I noticed that most of the regions for which I
>> get the above error were in the middle of (?) a split. For example:
>> >
>> > TestTable,0031841132,126412 column=info:regioninfo,
>> timestamp=1264470099796, value=REGION => {NAME => 'TestTa
>> >  1653303                     ble,0031841132,1264121653303', STARTKEY =>
>> '0031841132', ENDKEY => '0032057774',
>> >                             ENCODED => 644500792, OFFLINE => true, SPLIT
>> => true, TABLE => {{NAME => 'TestTab
>> >                             le', FAMILIES => [{NAME => 'info', VERSIONS
>> => '3', COMPRESSION => 'NONE', TTL =>
>> >                              '2147483647', BLOCKSIZE => '65536',
>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
>> >                             }}
>> >  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148,
>> value=10.129.68.214:60020
>> >  1653303
>> >  TestTable,0031841132,126412 column=info:serverstartcode,
>> timestamp=1264121655148, value=1264109117245
>> >  1653303
>> >  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796,
>> value=\000\n0031978284\000\000\000\0
>> >  1653303
>> 01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>> >
>> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>> >
>> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>> >
>> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>> >
>> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>> >
>> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>> >
>> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>> >                             5\337\274#
>> >  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796,
>> value=\000\n0032057774\000\000\000\0
>> >  1653303
>> 01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>> >
>> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>> >
>> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>> >
>> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>> >
>> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>> >
>> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>> >
>> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>> >                             75\312\262
>> >
>> > Any thoughts/ideas on what might be going on? Appreciate any input in
>> helping understand what might be going on.
>> >
>>
>> There is no line after the above?  A region with a startkey of
>> 0031841132?  Then its a dropped edit of .META.  The parent region --
>> the one being split got updates which added the splitA and splitB and
>> its offlining but the new region didn't get inserted?  The crash
>> happened just at that time?
>>
>> It would be interesting to take a look at the regionserver logs from
>> that time.  Please post if you have a moment so we can take a looksee.
>>
>> Above kinda thing is what the master rewrite is about moving state
>> transitions up to zk so atomic over cluster moving regions through
>> transitions rather than as here, a multi-row update might not all go
>> through as things currently work.
>>
>> St.Ack
>>
>> > Regards,
>> > Kannan
>> >
>> >
>> >
>>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>

Re: scanner lease expired/region server shutdown

Reply via email to