[
https://issues.apache.org/jira/browse/HBASE-19457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292097#comment-16292097
]
stack commented on HBASE-19457:
-------------------------------
Good one Appy. There are pieces that still need paving over. Looks like you
found one (I'm currently working on another).
When we truncate, we delete the table and its regions from hbase:meta or do we
just edit state? (Looks like we delete the regions... good).
Dang. Why is this Truncate Table not calling DeleteTable then CreateTable as
subprocedures? Why is it dup'ing procedure body?
If a crash puts us into a whack state such that on resumption we do the wrong
thing, then the Procedure is not written properly.
What is wrong about when it goes to assign? Is it that we have not finished
editing/adding all regions to hbase:meta?
I've been working on Master startup. It reads meta and if it finds regions in
OPEN state, it will reassign them trying to retain their old locations. It will
also assign regions that are OFFLINE which thinking about it now is NOT what we
want.
Who is doing the assign of regions with empty state?
(Can talk tomorrow boss)
> Debugging flaky
> TestTruncateTableProcedure#testRecoveryAndDoubleExecutionPreserveSplits
> ---------------------------------------------------------------------------------------
>
> Key: HBASE-19457
> URL: https://issues.apache.org/jira/browse/HBASE-19457
> Project: HBase
> Issue Type: Bug
> Reporter: Appy
> Assignee: Appy
> Attachments: HBASE-19457.master.001.patch, patch1, test-output.txt
>
>
> Trying to explain the bug in a more general way where understanding of
> ProcedureV2 is not required.
> Truncating table operation:
> ....
> delete region states from meta
> delete table state from meta
> ....
> add new regions to meta with state null.
> ....crash
> ....recovery: TableStateManager treats table with null state as ENABLED. AM
> treats regions with null state as offline. Combined result - AM starts
> assigning the new regions from incomplete truncate operation.
> Fix: Mark table as disabled instead of deleting it's state.
> ----
> *patch1*
> Just added some logging to help with debugging:
> - 60s was too less time, increased timeout
> - Added some useful log statements
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)