[
https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154441#comment-14154441
]
Virag Kothari commented on HBASE-12125:
---------------------------------------
A WAL roll on region server would be required only if the current WAL (WAL
being written to) is corrupted. So fixCorruptedReplicationWAL can be useful if
we know that the current WAL being written to is ok.
> Add Hbck option to check and fix WAL's from replication queue
> -------------------------------------------------------------
>
> Key: HBASE-12125
> URL: https://issues.apache.org/jira/browse/HBASE-12125
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Reporter: Virag Kothari
> Assignee: Virag Kothari
>
> The replication source will discard the WAL file in many cases when it
> encounters an exception reading it . This can cause data loss
> and the underlying reason of failed read remains hidden. Only in certain
> scenarios, the replication source should dump the current WAL and move to the
> next one.
> This JIRA aims to have an hbck option to check the WAL files of replication
> queues for any inconsistencies and also provide an option to fix it.
> The fix can be to remove the file from replication queue in zk and from the
> memory of replication source manager and replication sources.
> A region server endpoint call from the hbck client to region server can be
> used to achieve this.
> Hbck can be configured with the following options:
> -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL
> currently read by replication source) from replication queue. If there is a
> position associated, it also seeks to that position and reads an entry from
> there
> -hardCheckReplicationWAL: Check all WAL paths from replication queues by
> reading them completely to make sure they are ok.
> -fixMissingReplicationWAL: Remove the WAL's from replication queues which are
> not present on hdfs
> -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which
> are corrupted (based on the findings from softCheck/hardCheck). Also the
> WAL's are moved to a quarantine dir
> -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is
> first rolled over and then deals with it in the same way as
> -fixCorruptedReplicationWAL option
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)