[
https://issues.apache.org/jira/browse/HBASE-9514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768813#comment-13768813
]
stack commented on HBASE-9514:
------------------------------
Here you are adding back the random region move to try and bring on the issue
again:
- new FlushRandomRegionOfTableAction(tableName)
+ new FlushRandomRegionOfTableAction(tableName),
+ new MoveRandomRegionOfTableAction(tableName)
Why we need this?
- this.maximumAttempts =
-
this.server.getConfiguration().getInt("hbase.assignment.maximum.attempts", 10);
+ this.maximumAttempts = Math.max(1,
+
this.server.getConfiguration().getInt("hbase.assignment.maximum.attempts", 10));
It could be configured zero? You saying try at least once?
I suppose + public Lock acquireLock(final String encodedName) { has to be
public because SSH wants to use it too?
How long do servers hang out in dead servers?
{code}
+ if (!region.isMetaRegion() &&
+ regionStates.wasRegionOnDeadServer(encodedName)) {
+ LOG.info("Skip assigning " + region.getRegionNameAsString()
+ + " because it's host " +
regionStates.getLastRegionServerOfRegion(encodedName)
+ + " is dead but not processed");
+ // Make sure the region is offline so that SSH will assign it.
+ // Need to make sure we don't race with SSH.
+ regionOffline(region);
+ return;
+ }
{code}
I suppose it doesn't matter if in dead server for a long time since each server
has a startcode?
Does this big block of new code have to go into the middle of assign? Can it
be broken up a little into methods that are easier to grok?
{code}
+ if (serverManager.isServerOnline(server) &&
+ (t instanceof java.net.SocketTimeoutException ||
+ t instanceof java.net.ConnectException)) {
{code}
Is it a good idea inserting this wait here for every exception? What if the
exception is a NSRE? Doesn't NSRE indicate live server?
The big change in the middle I cannot follow. Can we have a note on what it
does?
Do declare and assign in one go I'd say:
+ lastAssignments = new HashMap<String, ServerName>();
I like this map in RS.
Good stuff Jimmy
> Prevent region from assigning before log splitting is done
> ----------------------------------------------------------
>
> Key: HBASE-9514
> URL: https://issues.apache.org/jira/browse/HBASE-9514
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Reporter: Jimmy Xiang
> Assignee: Jimmy Xiang
> Priority: Blocker
> Attachments: trunk-9514_v1.patch
>
>
> If a region is assigned before log splitting is done by the server shutdown
> handler, the edits belonging to this region in the hlogs of the dead server
> will be lost.
> Generally this is not an issue if users don't assign/unassign a region from
> hbase shell or via hbase admin. These commands are marked for experts only in
> the hbase shell help too. However, chaos monkey doesn't care.
> If we can prevent from assigning such regions in a bad time, it would make
> things a little safer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira