[ 
https://issues.apache.org/jira/browse/HBASE-26408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437418#comment-17437418
 ] 

Rushabh Shah commented on HBASE-26408:
--------------------------------------

> I agree that it's possible for postWALWrite to fail, and that should also 
> probably not abort.

[~bbeaudreault]  Trying to understand why it shouldn't abort ? postWALWrite 
failed but the entry is written to HDFS/WAL. But in HRegion#append, the write 
will fail causing it to roll back from memstore and again primary and 
replicated cluster will be out of sync. 

> Aborting to preserve WAL as source of truth can abort in recoverable 
> situations
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-26408
>                 URL: https://issues.apache.org/jira/browse/HBASE-26408
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Bryan Beaudreault
>            Priority: Major
>
> HBASE-26195 added an important feature to avoid data corruption by preserving 
> the WAL as a source of truth when WAL sync fails. See that issue for 
> background.
> That issue's primary driver was a TimeoutIOException, but the solution was to 
> catch and abort on Throwable. The idea here was that we can't anticipate all 
> possible failures, so we should err on the side of data correctness. As 
> pointed out by [~rushabh.shah] in his comments, this solution has the 
> potential to lose HBase capacity quickly in "not very grave" situations. It 
> would be good to add an escape hatch for those explicit known cases, of which 
> I recently encountered:
> I recently rolled this out to some of our test clusters, most of which are 
> small. Afterward, doing a rolling restart of DataNodes caused the following 
> IOException: "Failed to replace a bad datanode on the existing pipeline due 
> to no more good datanodes being available to try..."
> If you're familiar with HDFS pipeline recovery, this error will be familiar. 
> Basically the restarted DataNodes caused pipeline failures, those datanodes 
> were added to an internal exclude list that never gets cleared, and 
> eventually there were no more nodes to choose from resulting in an error.
> This error is pretty explicit, and at this point the DFSOutputStream for the 
> WAL is dead. I think this error is a reasonable one to simply bubble up and 
> not abort the RegionServer on, instead just failing and rolling back the 
> writes.
> What do people think about starting an allowlist of known good error messages 
> for which we do not trigger an abort of the RS? Something like this:
> {{} catch (Throwable t) {}}
>  {{  // WAL sync failed. Aborting to avoid a mismatch between the memstore, 
> WAL,}}
>  {{  // and any replicated clusters.}}
>  {{  if (!walSyncSuccess && !allowedException(t)) {}}
>  {{  rsServices.abort("WAL sync failed, aborting to preserve WAL as source of 
> truth", t);}}
>  \{{ }}}
> {{... snip ..}}
> {{private boolean allowedException(Throwable t) {}}{\{  }}
> {{  return t.getMessage().startsWith("Failed to replace a bad datanode");}}
> {{}}}
> We could of course make configurable if people like, or just add to it over 
> time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to