[
https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070154#comment-13070154
]
Eran Kutner commented on HBASE-4060:
------------------------------------
I will try to elaborate a bit on what I had in mind, I think it is not very far
from what Andrew suggested earlier.
First I should say that I am not familiar enough with the current
implementation so my understanding may not be correct or accurate. However,
based on what I understand, the current implementation doesn't seem to be
robust enough, because it is based on active communication between the master
and RSs, which leaves room for timeouts and failures.
My suggestion is to be more proactive about monitoring the assignment of
regions and allow the RSs themselves to know which regions are assigned to them
at any time.
I suggest opening a new znode in ZK for listing the regions and their
assignment. It can be something like /hbase/regions/<table>/<region>, so each
region will have a znode. Under that will be a znode for the assigned RS.
When the master assigns a region to a RS it should delete the old owner record
from the list and add the new one.
When a RS gets an assignment command from the master it should list the
children of the znode corresponding to the assigned region and set a watcher
for that. The RS should verify it is indeed the owner registered in ZK. If it
is not it should immediately refuse to accept the region assignment command.
If the RS receives an event trigger from one of the watchers it had set, it
should re-check that region assignment and validate it is still the owner of
the region. If it's not, it should relinquish control over the region.
The process so far should guarantee that there are never double assigned
regions, however it may create orphan regions which are not assigned to any RS.
To resolve that the master should periodically check for unassigned regions and
reassign them.
> Making region assignment more robust
> ------------------------------------
>
> Key: HBASE-4060
> URL: https://issues.apache.org/jira/browse/HBASE-4060
> Project: HBase
> Issue Type: Bug
> Reporter: Ted Yu
> Fix For: 0.92.0
>
>
> From Eran Kutner:
> My concern is that the region allocation process seems to rely too much on
> timing considerations and doesn't seem to take enough measures to guarantee
> conflicts do not occur. I understand that in a distributed environment, when
> you don't get a timely response from a remote machine you can't know for
> sure if it did or did not receive the request, however there are things that
> can be done to mitigate this and reduce the conflict time significantly. For
> example, when I run dbck it knows that some regions are multiply assigned,
> the master could do the same and try to resolve the conflict. Another
> approach would be to handle late responses, even if the response from the
> remote machine arrives after it was assumed to be dead the master should
> have enough information to know it had created a conflict by assigning the
> region to another server. An even better solution, I think, is for the RS to
> periodically test that it is indeed the rightful owner of every region it
> holds and relinquish control over the region if it's not.
> Obviously a state where two RSs hold the same region is pathological and can
> lead to data loss, as demonstrated in my case. The system should be able to
> actively protect itself against such a scenario. It probably doesn't need
> saying but there is really nothing worse for a data storage system than data
> loss.
> In my case the problem didn't happen in the initial phase but after
> disabling and enabling a table with about 12K regions.
> For more background information, see 'Errors after major compaction'
> discussion on [email protected]
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira