Virag Kothari created HBASE-12125:
-------------------------------------
Summary: Add Hbck option to check and fix WAL's from replication
queue
Key: HBASE-12125
URL: https://issues.apache.org/jira/browse/HBASE-12125
Project: HBase
Issue Type: Bug
Components: Replication
Reporter: Virag Kothari
Assignee: Virag Kothari
The replication source will discard the WAL file in many cases when it
encounters an exception reading it . This can cause data loss
and the underlying reason of failed read remains hidden. Only in certain
scenarios, the replication source should dump the current WAL and move to the
next one.
This JIRA aims to have an hbck option to check the WAL files of replication
queues for any inconsistencies and also provide an option to fix it.
The fix can be to remove the file from replication queue in zk and from the
memory of replication source manager and replication sources.
A region server endpoint call from the hbck client to region server can be used
to achieve this.
Hbck can be configured with the following options:
-softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL currently
read by replication source) from replication queue. If there is a position
associated, it also seeks to that position and reads an entry from there
-hardCheckReplicationWAL: Check all WAL paths from replication queues by
reading them completely to make sure they are ok.
-fixMissingReplicationWAL: Remove the WAL's from replication queues which are
not present on hdfs
-fixCorruptedReplicationWAL: Remove the WAL's from replication queues which
are corrupted (based on the findings from softCheck/hardCheck). Also the WAL's
are moved to a quarantine dir
-rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is
first rolled over and then deals with it in the same way as
-fixCorruptedReplicationWAL option
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)