[jira] [Commented] (HBASE-21577) do not close regions when RS is dying due to a broken WAL

Sean Busbey (JIRA) Mon, 11 Feb 2019 22:40:46 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-21577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765730#comment-16765730
 ]


Sean Busbey commented on HBASE-21577:
-------------------------------------

the approach makes sense. I'm looking through places where we throw 
{{DroppedSnapshotException}} and I agree they all look like places where making 
more FS requests isn't going to go well.

Can we quantify the improvement on time for the RS to go down with this in 
place?

It looks like there are no tests for RS abort given a DSE?

> do not close regions when RS is dying due to a broken WAL
> ---------------------------------------------------------
>
>                 Key: HBASE-21577
>                 URL: https://issues.apache.org/jira/browse/HBASE-21577
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>         Attachments: HBASE-21577.master.001.patch, 
> HBASE-21577.master.002.patch
>
>
> See HBASE-21576. DroppedSnapshot can be an FS failure; also, when WAL is 
> broken, some regions whose flushes are already in flight keep retrying, 
> resulting in minutes-long shutdown times. Since WAL will be replayed anyway 
> flushing regions doesn't provide much benefit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21577) do not close regions when RS is dying due to a broken WAL

Reply via email to