Dimitri Goldin created HBASE-8502: ------------------------------------- Summary: Stuck Region after failed split Key: HBASE-8502 URL: https://issues.apache.org/jira/browse/HBASE-8502 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: Dimitri Goldin Priority: Critical Attachments: stuck_region_exception.txt
Exact HBase version: 0.92.1-cdh4.1.2 A couple of days ago I encountered a RIT problem with a single region. After an hbck run it started trying to assign a region which has been bouncing between OFFLINE/PENDING_OPEN/OPENING for two days afterwards. This was due to a split gone wrong in some way, which led to several reference files being left in the region-directory despite the two relevant HFiles being copies successfully to the daughter. I will try to give as many details as possible, but unfortunately I was unable to find any information about the split itself. Short thread about this issue on the users-ML: http://mail-archives.apache.org/mod_mbox/hbase-user/201305.mbox/%3c5182758b.1060...@neofonie.de%3E === Parent region: 5b9c16898a371de58f31f0bdf86b1f8b Daughter region in question: 79c619508659018ff3ef0887611eb8f7 Rough sequence from the logs seems to be the following: === * Received request to open region: documents,7128586022887322720,1363696791400.79c619508659018ff3ef0887611eb8f7. * Setting up tabledescriptor config now ... * Opening of region {NAME => 'documents,7128586022887322720,1363696791400.79c619508659018ff3ef0887611eb8f7.', STARTKEY => '7128586022887322720', ENDKEY => '7130716361635801616', ENCODED => 79c619508659018ff3ef0887611eb8f7,} failed, marking as FAILED_OPEN in ZK * File does not exist: /hbase/documents/5b9c16898a371de58f31f0bdf86b1f8b/d/0707b1ec4c6b41cf9174e0d2a1785fe9 [...] === What happened, was that somehow (and that's the question here) the daughters region folder contained some left-over reference files were causing the RegionServer to look-up the parent region, which already was deleted. original contents of /hbase/documents/79c619508659018ff3ef0887611eb8f7/d: == 0707b1ec4c6b41cf9174e0d2a1785fe9.5b9c16898a371de58f31f0bdf86b1f8b 47511faae81b4452afd3ca206e28346f.5b9c16898a371de58f31f0bdf86b1f8b 4f01ecd052ce464d81e79a62ea227d6b 4f01ecd052ce464d81e79a62ea227d6b.5b9c16898a371de58f31f0bdf86b1f8b eb7dbb09701d4353be24ca82481c4a7e == I attached the full FileNotFound Exception. Please let me know if I can provide more information or help otherwise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira