[ 
https://issues.apache.org/jira/browse/HBASE-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062215#comment-13062215
 ] 

stack commented on HBASE-4058:
------------------------------

Dan Harvey who is still on 0.20.x had a similar issue this month.  He added 
four new servers to his cluster.  These new servers were not resolving 
properly.  What we were seeing is that on startup, I believe, these new servers 
would be assigned their portion of the regions on checkin.  Then, the 
basescanner would run -- its 0.20.x hbase -- and it would not recognize the 
address the new servers were writing .META. and it would then think the regions 
unassigned and would assign them elsewhere.  So, we have double-assignment and 
at same time there was splitting and compactions running.  His .META. had holes 
and overlaps.

In his case, not all tables were honked.  Just the big ones.  I wonder if an 
improved add_table.rb would work in this case; i.e. do the same rewrite of the 
.META. content for a single table based off the content in the filesystem 
rather than trying fix up on .META. table.

Let me try adding add_table.rb to hbck.  Let me add option of running per table 
and then a global, restore all tables.

Dan sent me the .META. dir content.  It looks like this:

{code}
-rw-r--r--@ 1 Stack  staff         0 Jul  7 08:26 281906331022358506
-rw-r--r--@ 1 Stack  staff  94283152 Jul  7 08:26 5233066973300534672
-rw-r--r--@ 1 Stack  staff         0 Jul  7 08:26 6803125877105432645
-rw-r--r--@ 1 Stack  staff         0 Jul  7 08:26 8650632001596730954
{code}

i.e. three zero-length files.  I wonder how these were written (I asked him for 
a dir listing from actual cluster).

> Extend TestHBaseFsck with a complete .META. recovery scenario
> -------------------------------------------------------------
>
>                 Key: HBASE-4058
>                 URL: https://issues.apache.org/jira/browse/HBASE-4058
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>             Fix For: 0.92.0
>
>
> We should have a unit test that launches a minicluster and constructs a few 
> tables, then deletes META files on disk, then bounces the master, then 
> recovers the result with HBCK. Perhaps it is possible to extend TestHBaseFsck 
> to do this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to