[
https://issues.apache.org/jira/browse/HBASE-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856979#action_12856979
]
Kannan Muthukkaruppan commented on HBASE-2439:
----------------------------------------------
@Todd:
<<< eventually the RS just devolves into repeatedly writing:
2010-04-13 19:20:17,396 WARN
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region .META.,,1 has too
many store files, putting it back at the end of the flush queue.
>>>
yes, that's exactly what I ran into as well.
Re: <<<< I think we should commit the whole thing for the durability branch,
and everything but the table descriptor change for the 0.20.4 branch >>>>
I am little confused about 0.20.4 vs. 0.20.5 vs. durability branch. I see
some issues moved to 0.20.5. Is 0.20's trunk now essentially 0.20.5?
And is there a separate durability branch outside of the 0.20.x series?
regards,
Kannan
> HBase can get stuck if updates to META are blocked
> --------------------------------------------------
>
> Key: HBASE-2439
> URL: https://issues.apache.org/jira/browse/HBASE-2439
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Kannan Muthukkaruppan
> Assignee: Kannan Muthukkaruppan
> Attachments: 2439_0.20_dont_block_meta.txt
>
>
> (We noticed this on a import-style test in a small test cluster.)
> If compactions are running slow, and we are doing a lot of region splits,
> then, since META has a much smaller hard-coded memstore flush size (16KB), it
> quickly accumulates lots of store files. Once this exceeds
> "hbase.hstore.blockingStoreFiles", flushes to META become no-ops. This causes
> METAs memstore footprint to grow. Once this exceeds
> "hbase.hregion.memstore.block.multiplier * 16KB", we block further updates to
> META.
> In my test setup:
> hbase.hregion.memstore.block.multiplier = 4.
> and,
> hbase.hstore.blockingStoreFiles = 15.
> And we saw messages of the form:
> {code}
> 2010-04-09 18:37:39,539 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Blocking updates for 'IPC Server handler 23 on 60020' on region .META.,,1:
> memstore size 64.2k is >= than blocking 64.0k size
> {code}
> Now, if around the same time the CompactSplitThread does a compaction and
> determines it is going split the region. As part of finishing the split, it
> wants to update META about the daughter regions.
> It'll end up waiting for the META to become unblocked. The single
> CompactSplitThread is now held up, and no further compactions can proceed.
> META's compaction request is itself blocked because the compaction queue will
> never get cleared.
> This essentially creates a deadlock and the region server is able to not
> progress any further. Eventually, each region server's CompactSplitThread
> ends up in the same state.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira