[
https://issues.apache.org/jira/browse/HBASE-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657610#action_12657610
]
stack commented on HBASE-1062:
------------------------------
A few comments on the patch Andrew:
+ Is it wise postponing memcache flushes? Even if its only for the 2 minutes
of HRS safe mode? We can take on updates during this time? If so, could we
OOME if rabid uploading afoot?
+ We schedule compactions on open and on flush. This would put off the open
scheduling for interval of 2 minutes. If cluster went down ugly, and some
regions had References outstanding, then these regions would not be splittable,
not until a memcache flush ran; i.e. it took on a bunch of uploads. Maybe
thats OK?
+ Do we ever break out of this loop:
{code}
+ if ((limit > 0) && (++count > limit)) {
+ try {
+ Thread.sleep(this.frequency);
+ } catch (InterruptedException ex) {
+ continue;
+ }
+ count = 0;
+ }
{code}
Looks like we increment count then set it to zero after sleep. It never
progresses?
> Compactions at (re)start on a large table can overwhelm DFS
> -----------------------------------------------------------
>
> Key: HBASE-1062
> URL: https://issues.apache.org/jira/browse/HBASE-1062
> Project: Hadoop HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: Andrew Purtell
> Assignee: Andrew Purtell
> Priority: Critical
> Fix For: 0.20.0
>
> Attachments: 1062-1.patch
>
>
> Given a large table, > 1000 regions for example, if a cluster restart is
> necessary, the compactions undertaken by the regionservers when the master
> makes initial region assignments can overwhelm DFS, leading to file errors
> and data loss. This condition is exacerbated if write load was heavy before
> restart and so many regions want to split as soon as they are opened.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.