[
https://issues.apache.org/jira/browse/HBASE-21611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin updated HBASE-21611:
-------------------------------------
Summary: REGION_STATE_TRANSITION_CONFIRM_CLOSED should interact better with
crash procedure (was: REGION_STATE_TRANSITION_CONFIRM_CLOSED should interact
better with crash procedure.)
> REGION_STATE_TRANSITION_CONFIRM_CLOSED should interact better with crash
> procedure
> ----------------------------------------------------------------------------------
>
> Key: HBASE-21611
> URL: https://issues.apache.org/jira/browse/HBASE-21611
> Project: HBase
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Priority: Major
>
> 1) Not a bug per se, since HDFS is not supposed to lose files, just a bit
> fragile.
> When a dead server's WAL directory is deleted (due to a manual intervention,
> or some issue with HDFS) while some regions are in CLOSING state on that
> server, they get stuck forever in REGION_STATE_TRANSITION_CONFIRM_CLOSED -
> REGION_STATE_TRANSITION_CLOSE - "give up and mark the procedure as complete,
> the parent procedure will take care of this" loop. There's no crash procedure
> for the server so nobody ever takes care of that.
> 2) Under normal circumstances, when a large WAL is being split, this same
> loop keeps spamming the logs and wasting resources for no reason, until the
> crash procedure completes. There's no reason for it to retry - it should just
> wait for crash procedure.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)