[ 
https://issues.apache.org/jira/browse/HBASE-19457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290632#comment-16290632
 ] 

Appy commented on HBASE-19457:
------------------------------

After more debugging, i think i finally have fix (sorry for being slow, just 
beginning to understand AM).

So the issue is,
We delete table's state from meta (in step [TRUNCATE_TABLE_REMOVE_FROM_META 
|https://github.com/apache/hbase/blob/7466e64abb2c68c8a0f40f6051e4b5bf550e69bd/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/TruncateTableProcedure.java#L102])
On recovery, TableStateManager#fixTableStates assumes that missing state means 
enabled table is enabled. 
([here|https://github.com/apache/hbase/blob/7466e64abb2c68c8a0f40f6051e4b5bf550e69bd/hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableStateManager.java#L218])
 
Later we add regions to meta and crash after that. On recovery, AM sees these 
regions, looks for table state and finds it enabled, and starts assigning them 
and screws up.

Simple fix here would be: Don't delete table state from meta, just let it 
remain DISABLED.
-------

But CreateTableProcedure also adds regions to meta and crashes. Why don't we 
see same issue there?
It adds region row to meta, but does not add any row for the table. 
On recovery, when AM looks for table state corresponding to those regions, 
TSM#getTableState() throws TableNotFoundException, which get's caught 
[here|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableStateManager.java#L135]..etc
 etc
End result being, it ignores those regions.

----

Some bigger questions to ponder:
1) Should we really assume missing state column as enabled? Probably assuming 
disabled is more conservative and better choice? Won't screws up the cluster. 
(Only other place delete the state column is hbck)
2) Shouldn't new regions always be added with state closed? (dev thread: 
http://mail-archives.apache.org/mod_mbox/hbase-dev/201712.mbox/browser)

> Debugging flaky 
> TestTruncateTableProcedure#testRecoveryAndDoubleExecutionPreserveSplits
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-19457
>                 URL: https://issues.apache.org/jira/browse/HBASE-19457
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Appy
>            Assignee: Appy
>         Attachments: patch1, test-output.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to