[
https://issues.apache.org/jira/browse/HBASE-19457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284510#comment-16284510
]
Appy edited comment on HBASE-19457 at 12/14/17 9:37 AM:
--------------------------------------------------------
Although the test timed out after 60 sec, it had at least reached
TRUNCATE_TABLE_ADD_TO_META state.
One bug i see here is, if master crashes after performing
TRUNCATE_TABLE_ADD_TO_META, when the new AM comes up, it reads back the meta
and assumes that these newly added regions are offline and need to be
reassigned (see logs below). But that's wrong since recovered
TruncateTableProcedure will try to do the same.
*Edit*: Striking out below text. AM shouldn't be assigning regions from
incomplete actions. Analysis of case when it does so (below stuff) is
irrelevant. Fix here should be truncate procedure better managing state of
table/regions in meta AND/OR assignment manager handling default cases in a
better way.
--Since there's table lock, they won't execute in parallel, but i think it'll
be:--
- --Fine if truncate proc executes first. Assign procs will complete
immediately saying regions are assigned.--
- --Bad if assign procs executes first. Since, when truncate proc will execute
later, it'll meddle with hbase:meta assuming table is disabled and try to
assign and fail.--
--An even worst scenario would be: If 1) master crashes, 2)region containing
meta crashes in between puts and only some puts succeed--
--Then new AM will bring partial regions (since even meta doesn't know about
all regions) of the table online.--
Waiting for more runs after increasing timeout to get better logs.
was (Author: appy):
Although the test timed out after 60 sec, it had at least reached
TRUNCATE_TABLE_ADD_TO_META state.
One bug i see here is, if master crashes after performing
TRUNCATE_TABLE_ADD_TO_META, when the new AM comes up, it reads back the meta
and assumes that these newly added regions are offline and need to be
reassigned (see logs below). But that's wrong since recovered
TruncateTableProcedure will try to do the same.
Since there's table lock, they won't execute in parallel, but i think it'll be:
- Fine if truncate proc executes first. Assign procs will complete immediately
saying regions are assigned.
- Bad if assign procs executes first. Since, when truncate proc will execute
later, it'll meddle with hbase:meta assuming table is disabled and try to
assign and fail.
An even worst scenario would be: If 1) master crashes, 2)region containing meta
crashes in between puts and only some puts succeed
Then new AM will bring partial regions (since even meta doesn't know about all
regions) of the table online.
Waiting for more runs after increasing timeout to get better logs.
> Debugging flaky TestTruncateTableProcedure
> ------------------------------------------
>
> Key: HBASE-19457
> URL: https://issues.apache.org/jira/browse/HBASE-19457
> Project: HBase
> Issue Type: Bug
> Reporter: Appy
> Assignee: Appy
> Attachments: patch1, test-output.txt
>
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)