[
https://issues.apache.org/jira/browse/HBASE-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657660#action_12657660
]
Andrew Purtell commented on HBASE-1062:
---------------------------------------
> Is it wise postponing memcache flushes?
I thought safe mode should be essentially "don't touch DFS".
> We schedule compactions on open and on flush. This would put off the open
> scheduling
> for interval of 2 minutes. If cluster went down ugly, and some regions had
> References
> outstanding, then these regions would not be splittable
Wouldn't the references be cleared when the deferred compactions finally are
allowed to run? Then the split would happen. This is what I observe while
testing.
> Do we ever break out of this loop [...] Looks like we increment count then
> set it to zero
> after sleep. It never progresses?
The code in question just sleeps (once) during the CompactSplitThread main loop
if count becomes greater than limit, then count is reset.
It looks like I still need to be more aggressive with making the compact/split
ramp-up a longer slope, at least given our cluster and circumstances. The
current patch helps but we can still overwhelm DFS sometimes after a restart.
> Compactions at (re)start on a large table can overwhelm DFS
> -----------------------------------------------------------
>
> Key: HBASE-1062
> URL: https://issues.apache.org/jira/browse/HBASE-1062
> Project: Hadoop HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: Andrew Purtell
> Assignee: Andrew Purtell
> Priority: Critical
> Fix For: 0.20.0
>
> Attachments: 1062-1.patch
>
>
> Given a large table, > 1000 regions for example, if a cluster restart is
> necessary, the compactions undertaken by the regionservers when the master
> makes initial region assignments can overwhelm DFS, leading to file errors
> and data loss. This condition is exacerbated if write load was heavy before
> restart and so many regions want to split as soon as they are opened.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.