[
https://issues.apache.org/jira/browse/HBASE-13937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14594090#comment-14594090
]
Enis Soztutar commented on HBASE-13937:
---------------------------------------
bq. Looking at the V2 patch. So we check once if the server is in the dead list
and then proceed to ping. This patch hoists out this check:
I think the patch does the exact opposite. It keeps the {{synchronized
(this.onlineServers) }} part, but removes the
{{catch (RegionServerStoppedException | ServerNotRunningYetException e)}} part.
The intent is to apply v2 directly without reverting the prev patch.
bq. This lgtm for application to 0.98, modulo the multicatch (Java 7+ only)
will need to be converted to equivalent Java 6 idiom.
Do you mind re-review according to above?
> Partially revert HBASE-13172
> -----------------------------
>
> Key: HBASE-13937
> URL: https://issues.apache.org/jira/browse/HBASE-13937
> Project: HBase
> Issue Type: Sub-task
> Components: Region Assignment
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: 0.98.14, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: hbase-13937_v1.patch, hbase-13937_v2.patch
>
>
> HBASE-13172 is supposed to fix a UT issue, but causes other problems that
> parent jira (HBASE-13605) is attempting to fix.
> However, HBASE-13605 patch v4 uncovers at least 2 different issues which are,
> to put it mildly, major design flaws in AM / RS.
> Regardless of 13605, the issue with 13172 is that we catch
> {{ServerNotRunningYetException}} from {{isServerReachable()}} and return
> false, which then puts the Server to the {{RegionStates.deadServers}} list.
> Once it is in that list, we can still assign and unassign regions to the RS
> after it has started (because regular assignment does not check whether the
> server is in {{RegionStates.deadServers}}. However, after the first assign
> and unassign, we cannot assign the region again since then the check for the
> lastServer will think that the server is dead.
> It turns out that a proper patch for 13605 is very hard without fixing rest
> of broken AM assumptions (see HBASE-13605, HBASE-13877 and HBASE-13895 for a
> colorful history). For 1.1.1, I think we should just revert parts of
> HBASE-13172 for now.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)