Yu Li reassigned HBASE-20156:

    Assignee: Yu Li

> Allow regionserver to live during HDFS failure
> ----------------------------------------------
>                 Key: HBASE-20156
>                 URL: https://issues.apache.org/jira/browse/HBASE-20156
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Major
> Currently if something is wrong with HDFS, for example NN fencing or get into 
> safe mode, RS will abort itself immediately after detecting it (such as log 
> roll or flush fail). And if we have a large scale cluster with dense writing 
> workload, there will be a huge amount of WAL to split and replay when HDFS is 
> back, and the recovery time might be tens of minutes or even hours (actually 
> we experienced this more than once in production, there're always some 
> surprise like unstable power supply for NN that we never expected...).
> Here we propose to add an option to allow RS not aborting during HDFS 
> failure, instead we will throw exceptions to clients indicating we're out of 
> service, while we could get recovered right after HDFS is back.
> This will also make it possible to restart HDFS in some extreme case, and 
> allow us to survive if anything wrong happened during HDFS upgrading.

This message was sent by Atlassian JIRA

Reply via email to