[
https://issues.apache.org/jira/browse/HBASE-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
dhruba borthakur reassigned HBASE-3604:
---------------------------------------
Assignee: dhruba borthakur
> Two region servers think that they own the same region: data loss
> -----------------------------------------------------------------
>
> Key: HBASE-3604
> URL: https://issues.apache.org/jira/browse/HBASE-3604
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.90.0
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
>
> I observed this on a 100 node cluster that is constantly doing about 500K
> ops/second.
> The region server on machine A was servicing IOs for a particular region.
> Then the machine went into a bad state where it is ping-able but not
> ssh-able. The master detected that there is a problem with machine A and
> reassigned the region to machine B. The regionserver on machine B opened the
> region and opened all the required HFiles for this region. After two hours,
> the NameNode received a delete request for one of the HFiles from machine A
> and happily renamed the file to HDFS-Trash. After another 3 hours or so, the
> regionserver on machine B tried to read contents from that HFile but failed
> because the file was renamed earlier. The region server on B in now stuck,
> and possible data loss.
> The problems stems from the fact that although the master-and-ZK reassigned
> the region, the old regionserver was not possibly dead.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira