[
https://issues.apache.org/jira/browse/HBASE-19457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290632#comment-16290632
]
Appy commented on HBASE-19457:
------------------------------
After more debugging, i think i finally have fix (sorry for being slow, just
beginning to understand AM).
So the issue is,
We delete table's state from meta (in step [TRUNCATE_TABLE_REMOVE_FROM_META
|https://github.com/apache/hbase/blob/7466e64abb2c68c8a0f40f6051e4b5bf550e69bd/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/TruncateTableProcedure.java#L102])
On recovery, TableStateManager#fixTableStates assumes that missing state means
enabled table is enabled.
([here|https://github.com/apache/hbase/blob/7466e64abb2c68c8a0f40f6051e4b5bf550e69bd/hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableStateManager.java#L218])
Later we add regions to meta and crash after that. On recovery, AM sees these
regions, looks for table state and finds it enabled, and starts assigning them
and screws up.
Simple fix here would be: Don't delete table state from meta, just let it
remain DISABLED.
-------
But CreateTableProcedure also adds regions to meta and crashes. Why don't we
see same issue there?
It adds region row to meta, but does not add any row for the table.
On recovery, when AM looks for table state corresponding to those regions,
TSM#getTableState() throws TableNotFoundException, which get's caught
[here|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableStateManager.java#L135]..etc
etc
End result being, it ignores those regions.
----
Some bigger questions to ponder:
1) Should we really assume missing state column as enabled? Probably assuming
disabled is more conservative and better choice? Won't screws up the cluster.
(Only other place delete the state column is hbck)
2) Shouldn't new regions always be added with state closed? (dev thread:
http://mail-archives.apache.org/mod_mbox/hbase-dev/201712.mbox/browser)
> Debugging flaky
> TestTruncateTableProcedure#testRecoveryAndDoubleExecutionPreserveSplits
> ---------------------------------------------------------------------------------------
>
> Key: HBASE-19457
> URL: https://issues.apache.org/jira/browse/HBASE-19457
> Project: HBase
> Issue Type: Bug
> Reporter: Appy
> Assignee: Appy
> Attachments: patch1, test-output.txt
>
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)