[
https://issues.apache.org/jira/browse/HBASE-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796839#comment-17796839
]
Bryan Beaudreault commented on HBASE-28260:
-------------------------------------------
Re: replication, I guess it would increase network traffic a bit on the source
cluster if there were not a local replica. That could be a consideration here.
> Possible data loss in WAL after RegionServer crash
> --------------------------------------------------
>
> Key: HBASE-28260
> URL: https://issues.apache.org/jira/browse/HBASE-28260
> Project: HBase
> Issue Type: Bug
> Reporter: Bryan Beaudreault
> Priority: Major
>
> We recently had a production incident:
> # RegionServer crashes, but local DataNode lives on
> # WAL lease recovery kicks in
> # Namenode reconstructs the block during lease recovery (which results in a
> new genstamp). It chooses the replica on the local DataNode as the primary.
> # Local DataNode reconstructs the block, so NameNode registers the new
> genstamp.
> # Local DataNode and the underlying host dies, before the new block could be
> replicated to other replicas.
> This leaves us with a missing block, because the new genstamp block has no
> replicas. The old replicas still remain, but are considered corrupt due to
> GENSTAMP_MISMATCH.
> Thankfully we were able to confirm that the length of the corrupt blocks were
> identical to the newly constructed and lost block. Further, the file in
> question was only 1 block. So we downloaded one of those corrupt block files
> and hdfs {{hdfs dfs -put -f}} to force that block to replace the file in
> hdfs. So in this case we had no actual data loss, but it could have happened
> easily if the file was more than 1 block or the replicas weren't fully in
> sync prior to reconstruction.
> In order to avoid this issue, we should avoid writing WAL blocks too the
> local datanode. We can use CreateFlag.NO_WRITE_LOCAL for this. Hat tip to
> [~weichiu] for pointing this out.
> During reading of WALs we already reorder blocks so as to avoid reading from
> the local datanode, but avoiding writing there altogether would be better.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)