[
https://issues.apache.org/jira/browse/HBASE-11580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345885#comment-14345885
]
Hudson commented on HBASE-11580:
--------------------------------
FAILURE: Integrated in HBase-TRUNK #6202 (See
[https://builds.apache.org/job/HBase-TRUNK/6202/])
HBASE-11580 Failover handling for secondary region replicas (enis: rev
9899aab12b419144f7f8a8280bedbccc68ee7452)
*
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/RemoveColumnAction.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
*
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestRegionReplicaReplicationEndpointNoMaster.java
*
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/executor/EventType.java
*
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionReplayEvents.java
*
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionReplicaFailover.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
*
hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestRegionReplicaReplication.java
* hbase-protocol/src/main/protobuf/WAL.proto
*
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
*
hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestPrefixTree.java
*
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientSmallScanner.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerRegionReplicaUtil.java
*
hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/util/RetryCounter.java
*
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java
*
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java
*
hbase-client/src/main/java/org/apache/hadoop/hbase/client/FlushRegionCallable.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/RegionReplicaFlushHandler.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/executor/ExecutorType.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* hbase-protocol/src/main/protobuf/Admin.proto
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionUtils.java
*
hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionAdminServiceCallable.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/RegionReplicaReplicationEndpoint.java
> Failover handling for secondary region replicas
> -----------------------------------------------
>
> Key: HBASE-11580
> URL: https://issues.apache.org/jira/browse/HBASE-11580
> Project: HBase
> Issue Type: Sub-task
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.1.0
>
> Attachments: hbase-11580_v2.patch, hbase-11580_v3.patch
>
>
> With the async wal approach (HBASE-11568), the edits are not persisted (to
> wal) in the secondary region replicas. However this means that we have to
> deal with secondary region replica failures.
> We can seek to re-replicate the edits from primary to the secondary when the
> secondary region is opened in another server but this would mean to setup a
> replication queue again, and holding on to the wals for longer.
> Instead, we can design it so that the edits form the secondaries are not
> persisted to wal, and if the secondary replica fails over, it will not start
> serving reads until it has guaranteed that it has all the past data.
> For guaranteeing that the secondary replica has all the edits before serving
> reads, we can use flush and region opening markers. Whenever a region open
> event is seen, it writes all the files at the time of opening to wal
> (HBASE-11512). In case of flush, the flushed file is written as well, and the
> secondary replica can do a ls for the store files and pick up all the files
> before the seqId of the flushed file. So, in this design, the secodary
> replica will wait until it sees and replays a flush or region open marker
> from wal from primary. and then start serving. For speeding up replica
> opening time, we can trigger a flush to the primary whenever the secondary
> replica opens as an optimization.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)