[ 
https://issues.apache.org/jira/browse/HBASE-21577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766539#comment-16766539
 ] 

Sergey Shelukhin commented on HBASE-21577:
------------------------------------------

Hmm, I can't find tests for region server abort in general. Do they exist 
somewhere?
As for the numbers, we've seen RS stuck in abort until the hard timeout (20 
minutes iirc) kicks in. With this change we no longer see that in 
DroppedSnapshot cases, although we didn't gather statistics.

> do not close regions when RS is dying due to a broken WAL
> ---------------------------------------------------------
>
>                 Key: HBASE-21577
>                 URL: https://issues.apache.org/jira/browse/HBASE-21577
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>         Attachments: HBASE-21577.master.001.patch, 
> HBASE-21577.master.002.patch
>
>
> See HBASE-21576. DroppedSnapshot can be an FS failure; also, when WAL is 
> broken, some regions whose flushes are already in flight keep retrying, 
> resulting in minutes-long shutdown times. Since WAL will be replayed anyway 
> flushing regions doesn't provide much benefit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to