On Tue, Jan 26, 2010 at 12:56 PM, Kannan Muthukkaruppan
<kan...@facebook.com> wrote:

> 2) region server log does report ulimit of 64k.
>

Good.  Then, we need to figure why its not effective at least for DNs
-- something like a kernel issue as per the J-D pointer.


> 3) An aside. Still using -XX:+CMSIncrementalMode. But I recall reading that 
> for machines with more than 2 cores, the recommendation is to not use the 
> incremental mode.

Yes.  Here is paragraph we quote:
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#icms.
 My guess is that its probably just ignored on a server-class machine
like yours.

You fellas probably have a bunch of GC experts in house.  Grab one and
show them your logs or post here and we can take a looksee.

St.Ack


>
> -----Original Message-----
> From: saint....@gmail.com [mailto:saint....@gmail.com] On Behalf Of Stack
> Sent: Tuesday, January 26, 2010 8:32 AM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: scanner lease expired/region server shutdown
>
> This is suspicious.  Make sure that second line in master or
> regionserver log where it prints out ulimit for the process has 64k
> (and not 1024).  You may not have set it for the DN user or setting it
> on your linux may be ornery (See
> http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6 for note about
> red hat based systems have upper bound for system, or here, where an
> interesting issue around kernel limits:
> http://wiki.apache.org/hadoop/Hbase/FAQ#A6).
> St.Ack
>
> On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan
> <kan...@facebook.com> wrote:
>>
>> Looking further up in the logs (about 20 minutes prior in the logs when 
>> errors first started happening), I noticed the following.
>>
>> btw, ulimit -a shows that I have "open files" set to 64k. Is that not 
>> sufficient?
>>
>> 2010-01-25 11:10:21,774 WARN 
>> org.apache.hadoop.hdfs.server.datanode.DataNode: 
>> DatanodeRegistration(10.129.68.212:50010, 
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, 
>> ipcPort=50020):Data\
>> XceiveServer: java.io.IOException: Too many open files
>>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>        at 
>> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
>>        at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
>>        at 
>> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>> 2010-01-25 11:10:21,566 WARN 
>> org.apache.hadoop.hdfs.server.datanode.DataNode: 
>> DatanodeRegistration(10.129.68.212:50010, 
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, 
>> ipcPort=50020):Got \
>> exception while serving blk_3332344970774019423_10249 to /10.129.68.212:
>> java.io.FileNotFoundException: 
>> /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta 
>> (Too many open files)
>>        at java.io.FileInputStream.open(Native Method)
>>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>>        at 
>> org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682)
>>        at 
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97)
>>        at 
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>>        at 
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>> ________________________________________
>> From: Kannan Muthukkaruppan [kan...@facebook.com]
>> Sent: Tuesday, January 26, 2010 7:01 AM
>> To: hbase-dev@hadoop.apache.org
>> Subject: RE: scanner lease expired/region server shutdown
>>
>> 1. Yes, it is a 5 node, setup.
>>
>> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and 
>> the other three are running region servers. ZK is on all the same 5 nodes. 
>> Should ideally have separated this out. The nodes are 16GB, 4 disk machines.
>>
>> 2. I examined the HDFS datanode log on the same machine around that time the 
>> problems happened, and saw this:
>>
>> 2010-01-25 11:33:09,531 WARN 
>> org.apache.hadoop.hdfs.server.datanode.DataNode: 
>> DatanodeRegistration(10.129.68.212:50010, 
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, 
>> ipcPort=\
>> 50020):Got exception while serving blk_5691809099673541164_10475 to 
>> /10.129.68.212:
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
>> channel to be ready for write. ch : 
>> java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010 
>> remote=/10.129.68.212:477\
>> 29]
>>        at 
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>>        at 
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>>        at 
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>>        at 
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>>        at 
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>>        at 
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>>        at 
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>> [For future runs, will try to turn on Java GC logging as well as per your 
>> suggestion.]
>>
>> 3.  You wrote <<< There is no line after the above?  A region with a 
>> startkey of 0031841132?  >>>
>>
>> I just cut-paste a section of the PE log and the scan of the '.META.' from 
>> shell. But there were many such regions for which the PE client reported 
>> errors of the form:
>>
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: 
>> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 
>> because: Connection refused
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed 
>> .META.,,1 for tableName=.META. from cache because of 
>> TestTable,0015411453,99999999999999
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached 
>> location for .META.,,1 is 10.129.68.212:60020
>>
>> and scan of '.META.' revealed that those regions, such as the 
>> 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META.
>>
>> <<< Then its a dropped edit of .META.  The parent region -- the one being 
>> split got updates which added the splitA and splitB and its offlining but 
>> the new region didn't get inserted?  The crash happened just at that time?>>>
>>
>> Didn't fully understand the above.
>>
>> <<< It would be interesting to take a look at the regionserver logs from 
>> that time.  Please post if you have a moment so we can take a looksee.>>>
>>
>> Will do. Should I just send it as an attachment to the list? Or is their a 
>> recommended way of doing this?
>>
>> regards,
>> Kannan
>> ________________________________________
>> From: saint....@gmail.com [saint....@gmail.com] On Behalf Of Stack 
>> [st...@duboce.net]
>> Sent: Monday, January 25, 2010 8:59 PM
>> To: hbase-dev@hadoop.apache.org
>> Subject: Re: scanner lease expired/region server shutdown
>>
>> What J-D said but with ornamentation.  See below.
>>
>> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
>> <kan...@facebook.com> wrote:
>>> I was doing some brute force testing - running one instance of 
>>> PerformanceEvaluation (PE) for writes, and another instance for randomReads.
>>>
>>
>> Whats the cluster your hitting like?  That 5-node thingy?  Whats the
>> hardware profile.
>>
>>
>>> One of the region servers went down after a while. [This is on 0.20.2]. The 
>>> region server log had things like:
>>>
>>> 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC 
>>> Server handler 34 on 60020 took 65190ms appending an edit to hlog; 
>>> editcount=27878
>>
>> This is saying that it took 65 seconds to append to hdfs.  What was
>> going on at that time?  A fat GC in the regionserver or over in a
>> Datanode?  You can enable GC logging uncommenting stuff in the
>> hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
>> (suggested by the zookeeper lads).  Its good for finding the long
>> pauses.  We should find the logs around the long GC pause.  Its
>> probably a failed promotion that brought on the stop-the-world GC.  Or
>> your HDFS was struggling?
>>
>> I was going to say that the PE puts the worse kind of loading on the
>> hbase cache -- nothing sticks around -- but looking at your numbers
>> below, cache seems to be working pretty well.
>>
>>> 2010-01-25 11:33:39,416 DEBUG 
>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: 
>>> Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>>>  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, 
>>> Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit 
>>> Ratio=85.40924191474\
>>> 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
>>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC 
>>> Server handler 52 on 60020 took 65195ms appending an edit to hlog; 
>>> editcount=27879
>>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC 
>>> Server handler 75 on 60020 took 65195ms appending an edit to hlog; 
>>> editcount=27880
>>> 2010-01-25 11:33:39,417 INFO 
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
>>> -5220035164735014996 lease expired
>>>
>>> This was followed by:
>>> 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception 
>>> closing session 0x12652c620520001 to 
>>> sun.nio.ch.selectionkeyi...@788ab708<mailto:sun.nio.ch.selectionkeyi...@788ab708>
>>> java.io.IOException: TIMED OUT
>>
>> This is the zk timeout.
>>
>>>
>>> And also errors like:
>>> 2010-01-25 11:33:39,692 ERROR 
>>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>>> org.apache.hadoop.hbase.UnknownScannerException: Name: -5220035164735014996
>>>
>>
>> These will happen after the above.  The regionserver is on its way
>> down.  Probably emptied the list of outstanding regions.
>>
>>
>>> Subsequently, I restarted the region server and just started the PE in 
>>> sequentialWrite mode. It seems to be off to a really slow start. For 
>>> several regions, it is printing the following:
>>>
>>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: 
>>> locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 
>>> because: region offline: TestTable,0031841132,1264121653303
>>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed 
>>> .META.,,1 for tableName=.META. from cache because of 
>>> TestTable,0032000000,99999999999999
>>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached 
>>> location for .META.,,1 is 10.129.68.212:60020
>>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: 
>>> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 
>>> because: region offline: TestTable,0031841132,1264121653303
>>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed 
>>> .META.,,1 for tableName=.META. from cache because of 
>>> TestTable,0032000000,99999999999999
>>> [..and so on..]
>>> 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached 
>>> location for .META.,,1 is 10.129.68.212:60020
>>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: 
>>> locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000 
>>> because: region offline: TestTable,0031841132,1264121653303
>>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed 
>>> .META.,,1 for tableName=.META. from cache because of 
>>> TestTable,0032000000,99999999999999
>>>
>>> When I scan the '.META.', I noticed that most of the regions for which I 
>>> get the above error were in the middle of (?) a split. For example:
>>>
>>> TestTable,0031841132,126412 column=info:regioninfo, 
>>> timestamp=1264470099796, value=REGION => {NAME => 'TestTa
>>>  1653303                     ble,0031841132,1264121653303', STARTKEY => 
>>> '0031841132', ENDKEY => '0032057774',
>>>                             ENCODED => 644500792, OFFLINE => true, SPLIT => 
>>> true, TABLE => {{NAME => 'TestTab
>>>                             le', FAMILIES => [{NAME => 'info', VERSIONS => 
>>> '3', COMPRESSION => 'NONE', TTL =>
>>>                              '2147483647', BLOCKSIZE => '65536', IN_MEMORY 
>>> => 'false', BLOCKCACHE => 'true'}]
>>>                             }}
>>>  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, 
>>> value=10.129.68.214:60020
>>>  1653303
>>>  TestTable,0031841132,126412 column=info:serverstartcode, 
>>> timestamp=1264121655148, value=1264109117245
>>>  1653303
>>>  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, 
>>> value=\000\n0031978284\000\000\000\0
>>>  1653303                     
>>> 01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>>>                             
>>> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>>>                             
>>> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>>>                             
>>> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>>>                             
>>> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>>>                             
>>> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>>>                             
>>> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>>>                             5\337\274#
>>>  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, 
>>> value=\000\n0032057774\000\000\000\0
>>>  1653303                     
>>> 01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>>>                             
>>> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>>>                             
>>> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>>>                             
>>> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>>>                             
>>> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>>>                             
>>> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>>>                             
>>> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>>>                             75\312\262
>>>
>>> Any thoughts/ideas on what might be going on? Appreciate any input in 
>>> helping understand what might be going on.
>>>
>>
>> There is no line after the above?  A region with a startkey of
>> 0031841132?  Then its a dropped edit of .META.  The parent region --
>> the one being split got updates which added the splitA and splitB and
>> its offlining but the new region didn't get inserted?  The crash
>> happened just at that time?
>>
>> It would be interesting to take a look at the regionserver logs from
>> that time.  Please post if you have a moment so we can take a looksee.
>>
>> Above kinda thing is what the master rewrite is about moving state
>> transitions up to zk so atomic over cluster moving regions through
>> transitions rather than as here, a multi-row update might not all go
>> through as things currently work.
>>
>> St.Ack
>>
>>> Regards,
>>> Kannan
>>>
>>>
>>>
>>
>

Reply via email to