[ 
https://issues.apache.org/jira/browse/HBASE-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856756#action_12856756
 ] 

Todd Lipcon commented on HBASE-2439:
------------------------------------

I can verify this bug - I saw it in my testing this afternoon as well. Relevant 
logs:

2010-04-13 19:19:01,322 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
compaction completed on region test1,4542214000,1271211529201 in 1sec
2010-04-13 19:19:01,322 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Starting compaction on region test1,4230893000,1271197630557
2010-04-13 19:19:01,328 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: 
test1,9652090000,1271197954171
2010-04-13 19:19:01,371 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
region test1,4983930000,1271198319326/845482654 available; sequence id is 
2300024
2010-04-13 19:19:01,371 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: 
test1,8588812000,1271198227526
2010-04-13 19:19:01,375 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
compaction completed on region test1,4230893000,1271197630557 in 0sec
2010-04-13 19:19:01,376 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Starting split of region test1,4230893000,1271197630557
2010-04-13 19:19:01,379 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Closed test1,4230893000,1271197630557
2010-04-13 19:19:01,383 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
region test1,8588812000,1271198227526/314680607 available; sequence id is 
2302647
2010-04-13 19:19:01,383 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: 
test1,9652090000,1271197954171
2010-04-13 19:19:01,385 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: 
test1,8075160000,1271198620343
2010-04-13 19:19:01,399 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: 
test1,4300550000,1271198160421
2010-04-13 19:19:01,419 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
region test1,9652090000,1271197954171/173125735 available; sequence id is 
2299793
2010-04-13 19:19:01,420 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: 
test1,8075160000,1271198620343
2010-04-13 19:19:01,433 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: 
test1,4662590000,1271197688651
2010-04-13 19:19:01,448 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Blocking updates for 'IPC Server handler 11 on 60020' on region .META.,,1: 
memstore size 32.0k is >= than blocking 3
2.0k size

eventually the RS just devolves into repeatedly writing:
2010-04-13 19:20:17,396 WARN 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region .META.,,1 has too 
many store files, putting it back at the end of the flush queue.

I'll try Kannan's patch

> HBase can get stuck if updates to META are blocked
> --------------------------------------------------
>
>                 Key: HBASE-2439
>                 URL: https://issues.apache.org/jira/browse/HBASE-2439
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>         Attachments: 2439_0.20_dont_block_meta.txt
>
>
> (We noticed this on a import-style test in a small test cluster.)
> If compactions are running slow, and we are doing a lot of region splits, 
> then, since META has a much smaller hard-coded memstore flush size (16KB), it 
> quickly accumulates lots of store files. Once this exceeds 
> "hbase.hstore.blockingStoreFiles", flushes to META become no-ops. This causes 
> METAs memstore footprint to grow. Once this exceeds 
> "hbase.hregion.memstore.block.multiplier * 16KB", we block further updates to 
> META.
> In my test setup:
>   hbase.hregion.memstore.block.multiplier = 4.
> and,
>   hbase.hstore.blockingStoreFiles = 15.
> And we saw messages of the form:
> {code}
> 2010-04-09 18:37:39,539 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Blocking updates for 'IPC Server handler 23 on 60020' on region .META.,,1: 
> memstore size 64.2k is >= than blocking 64.0k size
> {code}
> Now, if around the same time the CompactSplitThread does a compaction and 
> determines it is going split the region. As part of finishing the split, it 
> wants to update META about the daughter regions. 
> It'll end up waiting for the META to become unblocked. The single 
> CompactSplitThread is now held up, and no further compactions can proceed.  
> META's compaction request is itself blocked because the compaction queue will 
> never get cleared.
> This essentially creates a deadlock and the region server is able to not 
> progress any further. Eventually, each region server's CompactSplitThread 
> ends up in the same state.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to